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SPEECH RECOGNITION BASED INTERACTIVE INFORMATION 
RETRIEVAL SCHEME USING DIALOGUE CONTROL TO REDUCE 
USER STRESS 



BACKGROUND OF THE INVENTION 
FIELD OF THE INVENTION 

The present Invention relates to a speech recognition 
10 based interactive information retrieval scheme alined at 
retrieving- user's intended information through a speech 
dialogue with a user. 

n DESCRIPTION OF THE BACKGROUND ART 

;tU 15 The computer based speech recogrnltion processing- is a 

!k processing- for marching a user input speech with a 

iJ recognition target database, and calculating a similarity 

of the input speech with respect to every word in the 
[■Q database as a recognition likelihood. The current 

20 recognition technolog-y has a limitation on the number of 
recognition target words for which the recognition result 
ixi can be outputted within a real dialogue processing time, 

y and a considerable amount of time is required until 

5 ' returning a response to the user when the number of 

25 recognition target words exceeds this limit. Also, a 

lowering of the recognition accuracy due to an Increase of 
the recognition target words is unavoidable. Moreover, tne 
recognition accuracy is largely dependent on speakers and 
speech utterance environments, and a lowering of the 
30 recognition accuracy due to surrounding noise or a lowering 
of the recognition accuracy due to Incompleteness of the 
input speech uttered by a speaker can occur even in the 
case where a recognition device has high performance and 
accuracy, so that there is no guarantee for being able to 
35 always obtain 100% accuracy. 
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The conventional speech rGcognition based interactive 
information retrieval system carries out the recognition 
processing- usin^ a speech recognition device with respect 
to a user's input speech, keeps a user awaiting until the 
5 processing is finished, and presents candidates obtained as 
a result of the recognition to the user sequentially In a 
descending order of recognition likelihood by repeating the 
presentation of candidates until a correct one is confirmed 
by the user. 

10 On the other hand, in the case of utilizing speech as 

interface for the information providing service, the real 
time performance and the accuracy are required. When there 
are many recognition target words, the target Information 
is classified by an attribute tree formed by a plurality of 

15 hierarchical levels. Lower level attributes have a greater 
possibility of having the number of attribute values that 
exceeds the number that can be processed within the real 
dialogue processing time. In order to ascertain the user's 
intended target information, there is a need to determine 

20 an attribute value at each level, but a higher level 

attribute value can be automatically determined by tracing 
the tree once a lower level attribute value is determined 
(provided that the determined lower level attribute value 
and the related lower level attribute value are in one*to- 

25 one correspondence without any overlap) - Consequently, it 
is possible to expect that the target information can be 
ascertained in short time if it is possible to ascertain 
the lower level attribute value first. 

However, the conventional speech recognition based 

30 Interactive information retrieval system does not allow the 
user to input the lower level attribute value first in view 
of the recognition error and the number of words that can 
be processed within a time that does not spoil naturalness 
of the dialogue with the user. Namely, it has been 

35 necessary to adopt a method for narrowing the recognition 
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target words down to the number of data that can be 
processed within the real dialogue processing: time by first 
asking a query for the higher level attribute for which the 
number of attribute values is small and requesting input, 
5 determining the attribute value by repeating presentation 
of candidates obtained as a result of the recognition in a 
descending order of recognition likelihood until the 
entered attribute value can be determined, and selecting 
only those attribute values that are related to the 
10 determined higher level attribute value among the next 
level attribute values as the next recognition tar&et, _ 

Such a conventional method cannot narrow down the next 
level recognition target attribute values unless the higher 
level attribute value is determined so that the 
Q 15 presentation of candidates to the user is repeated until 
?S the higher level attribute value is determined. However, in 

iB this conventional method, a process including the attribute 

value input request, the candidate presentation and 
u confirmation until the attribute value is determined for 

20 each attribute, and the narrowing down of the next level 
attribute values after the attribute value determination, 
iJI Is required to be repeated as many times as the number of 

^ hierarchical levels involved in order to ascertain the 

□ target information, and this number of repetition is 

□ 25 greater for the target information that has deeper 

attribute hierarchical levels, so that it has been 
difficult to ascertain the target information efficiently. 

In a system for ascertaining a target information from 
an information database that comprises the number of words 

30 exceeding the number that can be processed within the real 
dialogue processing time, In order to determine rhe (lower 
level) attribute value from which the target information 
can be ascertained, the user is kept awaiting during the 
recognition processing and the confirmation process for 

35 sequentially presenting the recognition result is carried 
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out. However, when ir is difficult ro determine the correct 
attribute value smoothly due to recognition errors, it is 
necessary to repeat the confirmation process many times 
despite of the fact that the user has already been kept 

5 awaiting, and this can make the dialogue unnatural and 
cause a great stress on the user. 

Consequently, in the current system based on the 
current speech recognition technology » it is impossible to 
allow the user's input starting from the lower level 

10 attribute value such that a reasonably accurate response 
can be returned without requiring a wait time to the user, 
and it is necessary to request the user's input 
sequentially from the higher level attribute value and 
repeat the attribute value determination. The recognition 

15 target words of the lower level are to be narrowed down by 
determining the higher level attribute value, so that the 
dialogue cannot proceed further until the higher level 
attribute value is determined. In other words, there is a 
need for the confirmation process until it becomes possible 

20 to determine the entered attribute value at each level. 
If it is possible to ascertain the lower level 
attribute value first, the higher level attribute value can 
be ascertained automatically so that the target information 
can be ascertained efficiently, and In view of this fact, 

25 the currently used process for repeating query ► 
determination and confirmation process until the 
determination with respect to each query sequentially from 
the higher level is very circumlocutory or circuitous for 
the user, 

30 In particular, the user is forced to enter input from 

the higher level because input from the lower level is not 
allowed, the presentation and confirmation process must be 
repeated when it is not possible to obtain a correct 
attribute value as a top candidate due to recognition 

35 errors, and the attribute value input and the confirmation 
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process must be repeated as many rimes as the number of 
hierarchical levels involved until the target Information 
is ascertained (the lowest level attribute value is 
determined) even after determining: each input by several 
5 trials of the presentation and confirmation process - 

Although these are Indispensable processes for the system, 
they appear as very circuitous and superfluous processes 
for the user who prefers natural and short dialogues, and 
cause a great stress on the user. 
10 As a method for ascertaining the target information 

while reducing stress on the user, allowing the user's 
Input from the lower level attribute value can be 
considered, but this requires the determination of the 
attribute value that has the number of recoghitlon target 
15 words exceeding the number that can be processed within the 
real dialogue processing time. 

Also, in the computer based speech recognition 
processing, the recognition of speeches by unspecified 
speakers and speeches uttered at irregular utterance speed 
20 are particularly difficult, and in addition the degradation 
of speech quality due to surrounding noise or the like can 
make 100^ speech recognition accuracy practically 
impossible, so that the instantaneous determination of a 
speech retrieval key that Is entered as the user's speech 
25 input is difficult- 

Also, In the speech recognition based interactive 
information retrieval system, in order to realize the 
natural dialogues with the user» it is prerequisite for the 
system to return a response to the user's input in real 
30 time that does not appear unnatural to the human sense. 
However, there is a limit to the number of words that can 
be speech recognition processed within a prescribed period 
of time. For this reason, when the recognition target is a 
large scale database having the number of words that cannot 
35 be processing within a prescribed period of time, It is 
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difficult to achieve the task requested by the user within 
a prescribed period of time through natural dialogues 
between the user and the system, without making the user 
conscious of the processing time required for the 

5 information retrieval at a time of the speech recognition 
processing by the system as well as the incompleteness of 
the speech recognition accuracy by the system. 

Consequently it is necessary to keep the user awaiting 
while the system outputs the recognition processing result 

10 and when the presented result turns out to be the 

recognition error It is necessary to keep the user awaiting 
further until another recognition result is presented, so 
that it is difficult to construct a system using speech as 
input interface that has both quickness and accuracy 

15 equivalent to a human operator based system, according to 
the current speech recognition technology. 

Also, in the conventional retrieval method aiming at 
the determination of the retrieval key requested by the 
user with respect to a large scale database that cannot be 

20 processed in real time, because of the limitation on the 
number of data that can be speech recognition processed In 
real time, the user is urged to enter a retrieval assist 
key that can lead to the narrowing down of the retrieval 
key candidates such that the recognition targets can be 

25 reduced from the entire large scale database to the number 
of data that can be processed in real time, without 
allowing the user to enter the requested retrieval key 
immediately. 

Here, the retrieval assist keys are selected to be 
30 data formed by the number of data that can be processed in 
real time, such that each retrieval key to be requested by 
the user always has one retrieval assist key as its higher 
level key, the retrieval assist key (higher level key) of 
Xhe retrieval key to be requested Is simple and obvious to 
35 the user, and lower level keys (the retrieval keys to be 
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requested by the user) belonging to one retrieval assist 
key are formed by the number of data that can be processed 
in real tlmep so as to enable the determination of the 
retrieval key. 

5 Also, m the conventional retrieval method aimed at 

the determination of the retrieval key requested by the 
user usin^ the speech input, the speech recognition 
processing- with respect to the retrieval assist key (higher 
level key) is carried out first, and the obtained retrieval 

10 assist key Ihig'her level key) candidates are presented to 
the user sequentially in a descending order of the 
recognition likelihood until a response indicating it is a 
correct one is obtained. When the retrieval assist key is 
determined, the retrieval key (lower level key) candidates 

15 having the determined retrieval assist key as the higher 

level key are extracted as the recognition target data, and 
the input of the retrieval key (lower level key) that the 
user really wants to request is urged to the user. 
Similarly as for the retrieval assist key, the retrieval 

20 key is determinea by presenting the retrieval key 

candidates obtained by the speech recognition processing to 
the user sequentially in a descending order of recognition 
likelihood until a response Indicating it is a correct one 
is obtained. 

25 As such, the current speech recognition technology has 

a limit ro the number of words for which the matching with 
the speech recognition database, the recognition likelihood 
calculation and the recognition result output can be 
carried out in real time, so that a longer recognition time 

30 is required when the number of recognition target words is 
increased. In the speech retrieval system using speech as 
Input interface, when the recognition target is a large 
scale database, keeping the user awaiting during the speech 
recognition processing by the system can cause stress on 

35 the user, so that the current system carries out the 
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narrowine down of the recognition target by utilizing the 
attribute values of the attribute items that each 
recognition target data has, so as to be able to output the 
recognition result in real time, 
5 However, the current speech recognition technology is 

such that the 100% speech recognition accuracy cannot be 
attained even when the recognition target is narrowed down 
to the number of words that can be processed in real time. 
In particular, the recognition of speeches by unspecifiea 
10 speakers, speeches uttered at irregular utterance speed, 
and speech uttered under the noisy environment are 
particularly difficult, so that the confirmation process 
for confirming the recognition result to the user is 
indispensable in order to ascertain the input speech. The 
15 confirmation process is a process for presenting the 

recognition candidates obtained by the speech recognition 
processing to the user sequentially in a descending order 
of recognition likelihood. The number of confirmation 
processes becomes larger for the poorer input speech 
20 recognition accuracy. However, the user demands the Input 
interface to have a handling equivalent to the human 
operator, so that the repeated confirmation processes can 
cause stress on the user . 

In the current speech recognition based interactive 
25 information retrieval system using a large scale database 
as the recognition target, the attribute value input for 
the attribute item in order to narrow down the recognition 
target to the number that can be processed in real time is 
urged, and then the user's requested retrieval key input is 
30 urged when the recognition target is narrowed down 

according to the attribute values, so that the confirmation 
process is required for both the attribute value and the 
retrieval key. The attribute value input Is an 
Indispensable process in realizing the real time 
35 recognition processing from a viewpoint of the system, but 
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ir is circuitous for the user because the retrievaX key 
that the user really wants to request cannot be entered 
immediately, and the confirmation processes are repeated 
t^lce. once for the attribute value detection and another 
5 for the retrieval key determination, which cause further 
stress on the user- 

Also, the retrieval system using speech as Input 
Interface and having a large scale database as the 
recognition and retrieval target is aiming at providing 

10 quick and accurate responses to the user such that the user 
may have an illusion of dialogue with a human operator • so 
that it has been necessary to adopt a query format that can 
lead to the narrowing down of the number of recognition 
target words effectively for the system such that the 

15 recognition processing time and the recognition accuracy 
can be compensated. For this reason, without allowing the 
input of the retrieval key that the user really wants to 
request immediately, the retrieval assist key that can lead 
to the narrowing down of the retrieval key is determined 

20 first. However, the user is forced to enter the input of 
the retrieval assist key first rather than the retrieval 
key that the user really wants to request and then urged to 
enter the retrieval key only after the retrieval assist key 
is determined, so that this process may appear to the user 

25 as if a superfluous process for the user (indispensable 
process for the system) is forced before the input of the 
retrieval key that the user really wants to request and can 
cause stress on the user- 

30 

SUIVIMARY OF THE INVENTION 

It is therefore an object of the present invention to 
provide a speech recognition based Interactive information 
35 retrieval scheme capable of ascertaining the target 
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information by determining the attribute values without 
making the user conscious of the time required for the 
speech recognition processinff and the retrieval, and 
without causing unnatural dialogues witn the user due to 
5 Incompleteness of the speech recognition processing- In 
this scheme, in a process for determining the attribute 
value necessary in order to ascertain the target 
information, the recognition target attribute value can be 
determined even when the number of attribute values exceeds 

10 the number that can be processed within a prescribed period 
of time, by utilizing a method for narrowing down the 
recognition target words that can return a response with a 
tolerable level of accuracy for the user without making the 
user to have a feeling of being kept awaited, and a method 

15 for ascertaining input that can realize the reduction or 
the omission of the confirmation processes . 

It is another object of the present Invention to 
provide an operator-less speech recognition based 
interactive Information retrieval scheme using speech 

20 dialogues based on the dialogue control which is capable of 
determining the retrieval key entered by the user through 
natural dialogues. In this scheme, the retrieval key can be 
determined using a large scale database having the 
retrieval target words that cannot be processed within a 

25 prescribed period of time, without making the user 

conscious of the time required for the speech recognition 
processing and the database matching, and without causing 
unnatural dialogues with the user due to incompleteness of 
the speech recognition processing, such that the task of 

30 determining the speech retrieval key entered by the user 
can be achieved in the operator-less speech recognition 
based interactive information retrieval system, without 
making the user conscious of the waiting time, through 
dialogues that have both quickness and naturalness 

35 equivalent to a human operator based system. 
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It is another object of the present Invention to 
provide a speech recogrnitlon based interactive information 
retrieval scheme using a lar^e scale database as the 
recognition target, which is capable of ascertaining a 
5 retrieval key entered by the speech input while reducing 
stress on the user- In this scheme, the retrieval key is 
ascertained without carrying out the attribute value 
determination, such that the confirmation process for the 
purpose of determining the attribute value is eliminated 
10 and the circuity due to the confirmation process is 

eliminated, while the processing time required for the 
retrieval key determination is shortened. 

It is another object of the present invention to 
provide a speech recognition based interactive information 
15 retrieval scheme capable of realizing the retrieval that 
has both quickness and naturalness in determining the 
retrieval key from a large scale database. In this scheme, 
the recognition and the retrieval are carried out without 
making the user conscious of the waiting time and 
20 incompleteness of the recognition accuracy during the 
recognition even when the retrieval key that the user 
really wants to request is entered immediately at the 
beginning, by utilizing the bias in the access frequencies 
of data in the large scale database, in the retrieval aimed 
25 at determining the retrieval key entered by the user using 
the large scale database as the recognition target. 

First, in the first scheme of the present invention, 
at a time of determining the attribute value of the 
attribute having the number of attribute value candidates 
30 exceeding the number than can be processed within the real 
dialogue processing time in the information database, the 
importance levels are assigned to a set of the recognition 
target attribute values (recognition target words) of that 
attribute according to the bias of the past access 
35 frequencies or the like, and the priority recognition 
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processing with respect to data with a hlgrher Importance 
level is carried out. In order to return a response having 
a tolerable level of accuracy for the user within such a 
time that the user does not sense any stress and 

5 unnaturalness in response to the Input of the retrieval 
target attribute. 

Namely, the number of attribute values (the number 
specified by the system, which is assumed to be N) that can 
be processed within the real dialogue processing: time by 

10 the speech recogrnltlon device are selected as the 

prioritized recognition target words according to the 
Importance levels, and the speech recognition processing is 
carried out at a higher priority for these prioritized 
recognition target words. 

15 Then, based on a comparison of a prescribed threshold 

and the recognition likelihood with respect to each 
attribute value candidate that is calculated from the 
recognition result, for example, when a prescribed 
condition for Judging that the attribute value can be 

20 ascertained only by the confirmation process with the user 
is satisfied, the confirmation process for presenting the 
result to the user is attempted. 

In the recognition processing for the prioritized 
recognition target words, the prioritized recognition 

25 target words are formed by those attribute values that have 
higher possibility of being accessed, from the attribute 
values if the attribute that exceed the number than can be 
processing within the real dialogue processing time, so 
that appropriate recognition result can be presented at 

30 this point in many cases with respect to the most users. 
When the above condition for judging that the 
attributed value can be ascertained only by the 
confirmation process Is not satisfied, either the target 
attribute value is not contained In the prioritized 

35 recognition target words, or the accuracy of the 
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recognition device Is poor so that a correct one was not 
obtained as leading candidates. In this case, the dialogue 
is proceeded to a related in:formation query, where other 
hierarchically adjacent attribute is queried, for example, 
5 and the attribute value is determined by cross-checking: the 
recognition result of the other attribute and the earlier 
recognition result such that the conventionally used 
repetition of the confirmation processes starting from the 
leading candidates is eliminated and thereby the user 
10 stress is eliminated. 

One of the features of this first scheme Is that trie 
dialogue is proceeded to the related Information query 
while the recognition processing for the non-prioritized 
recognition target words is carried out In parallel by 
15 utilizing the related information query dialogue time, in 
order to deal with the case where the target attribute 
value is contained in the remaining non-prioritized 
recognition target words, without notifying the user that 
the processing up to this point has been based only on the 
20 recognition result for the prioritized recognition target 
words- When the recognition processing for a response to 
the related information query is carried out and the 
recognition result is obtained, the recognition result for 
only those non-prioritized recognition target words for 
25 which the recognition processing has been finished by then 
in the parallel recognition processing are added to the 
recognition result of the prioritized recognition target 
words, and the recognition result is narrowed down by 
referring: to the relevancy with the recognition result of 
30 the related information query response. 

Here, when the non-prioritized recognition target 
words comprises the number of words that exceeds the number 
(N) that can be processed within the real dialogue 
processing time, the recognition processing for the non- 
35 prioritized recognition target words is still not completed 
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by the time when a response to one related information 
query Is obtained, and the user would have to be kept 
awaited If the recogrnltlon processing Is continued up to 
the completion. In such a case, the non-prlorltlzed 
5 recog-nltlon target words are sxibdivided Into a plurality of 
sets each having N words- Then, the recognition processing 
is carried out by supplying each set of the non-prloritlzed 
recognition target words sequentially in a descending order 
of the Importance level as the next recognition target 
10 words to the recognition device. Then, the recognition 

result for each set of non-prioritized recognition target 
words that has been processed by the rime when a response 
to the related information query is entered by the user is 
added to the recognition result obtained so far. 

□ 15 Such a related Information query has an effect of 

m realizing a natural dialogue in which the user answers a 

ffl question that seems natural, rather than a superfluous 

process such as the waiting time or the repeated 

i y 

confirmation process. On the other hand, from a viewpoint 
20 of the system, the related information query dialogue time 
n can be utilized as the recognition time for the non- 

m prioritized recognition target words, and in addition. If a 

% related information that can lead to the narrowing down of 

□ the attribute value to be determined is obtained from the 

=3 25 relevancy among the attribute values, this obtained related 

information can be utilized as information for narrowing 
down the attribute value. 

Then, whether the condition for judging that the 
target attribute value can be ascertained only by the 

30 confirmation process is satisfied or not is checked again 
with respect to the result obtained by cross-checking the 
result of the related Information query and the earlier 
recognition result, and if this condition is satisfied the 
confirmation process is attempted, whereas otherwise 

35 another related information is queried. 



-14- 



00-05-26^ 19:27 5g9c-001 14049492499 



IS SItc-MIYOSI & MIYOSI 



T-563 P. 21/86 U-221 



If the recognition processing for the noa-prioritlzed 
recognition target words has not been completed yet, the 
recognition processing is continued by utilizing the 
related Information query dialogue time in order to deal 
5 with the case where the target attribute value is contained 
in those attribute values for which the recognition 
processing has not been carried out yet. When there is no 
more related information to be queried, further recognition 
processing time for the non-priorlti2ed recognition target 
10 words is gained by repeating the similar related 

information queries several times or by presenting the 
recognition result of the related information query 
response in order to obtain more accurate related 
information, for example. 
15 In this first scheme, the dialogue is proceeded in 

such a way that the user remains totally unaware of the 
internal processing state of the system, so that it is 
possible to realize the attribute value determination and 
the target information ascertaining through a flow of 
20 natural dialogues. Namely, according to this first scheme, 
it becomes possible to make it appear to the user as If the 
system is carrying out the recognition processing for all 
the attribute values and returning a response according to 
such recognition result. The dialogue is proceeded to the 
25 related information query such that the user remains 

unaware of the fact that the first response is actually 
returned according to the recognition result only for the 
prioritized recognition target words, and the fact that the 
target attribute value may not necessarily be contained in 
30 the prioritized recognition target words. 

Then, by cross-checking the result of the related 
information query while adding the recognition result for 
the non-prioritixed recognition target words that is 
obtained by the gradually continued recognition processing, 
35 it is possible to maintain natural dialogues with the user 
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while determining the Input attribute value and 
ascertaining the target information within appropriate 
time, even with respect to the recognition target words 
that exceed the number that can be processed within the 
5 real dialogue processing* time, without causing the user to 
feel unnaturalness or stress. 

According to this first scheme, it becomes possible ro 
allow the user to immediately enter the lower level 
attribute value input, which seems like a natural and 

10 efficient way of ascertaining the target information from 
the user's perspective, and moreover the Inadvertent 
repetition of the confirmation process is avoided, so that 
the reduction of the stress on the user can be expected- In 
addition, it is possible to realize the interactive 

15 Information retrieval process that has both high accuracy 
and naturalness and that does not make the user conscious 
of the waiting time and Incompleteness of the recognition 
accuracy. 

Next, in the second scheme of the present invention, 

20 the importance levels are assigned to data in the speech 
recognition database having a large number of the speech 
recognition target words that cannot be processed within a 
prescribed time, according to the statistical Information 
such as past access frequencies or utilization frequencies, 

25 Then, a plurality of statistically hierarchlzed databases 
are formed by partial databases created by selecting 
respectively defined prescribed numbers of data 
sequentially from data having higher Importance levels, and 
hierarchically structuring these partial databases such 

30 that a lower level partial database contains a larger 
number of data and the lowest level partial database 
contains all data of the speech recognition database. These 
statistically hierarchlzed databases are specifically 
designed to maintain the naturalness of the dialogue to be 

35 carried out between the user and the system In order to 
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achieve The task. 

Here, the real time performance is realized virtually 
by utilizing differences between the processlne times for 
different levels due to differences In the number of data 
5 contained at different levels. Namely, the speech 
recognition processing and the speech retrieval key 
candidate extraction based on the speech recognition 
likelltiood are carried out in parallel for different levels 
of the statistically hierarchized databases, and the 
10 dialogue leading with respect to the user Is carried out 
sequentially for different levels, starting from the 
highest level statistically hierarchized database for which 
the processing is finished first, while continue processing 
the other levels, 
15 The statistically hierarchized databases used in this 

second scheme are retrieval key attribute databases that 
maintain attribute values of the attribute items expressing 
features of each data as the related attribute information, 
with respect to all data of the retrieval target speech 
20 recognition database. The related attribute information Is 
utilized at a time of carrying out the retrieval key 
determination related query in which the related attribute 
information of the speech retrieval key Is queried in order 
to narrow down the speech retrieval key in this scheme. 
25 Also, in this second scheme, in order to narrow down 

candidates from the speech retrieval key leading 
candidates, when a plurality of related attribute 
information candidates obtained from the retrieval key 
determination related query and the speech retrieval key 
30 leading candidates to be narrowed down are found to be 
related by referring to the retrieval key attribute 
database, the retrieval key recognition likelihood and the 
related information recognition likelihood are normalized 
and multiplexed so as to realize the candidate 
35 determination. 
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This second scheme realizes the speech retrieval key 
determination In a speech recognition based Interactive 
Information retrieval apparatus aiming at the speech 
retrieval key determination for which the retrieval target 
5 Is the speech recognition database having a large number of 
the speech recognition target words for which the speech 
recognition processing and the database matching cannot be 
carried out within a prescribed period of tjme that can 
maintain the naturalness of the dialogues to be carried out 
10 between the user and the system for the purpose of the 
speech retrieval key determination. Here, the speech 
retrieval key determination is realized without making the 
user conscious of time required for the speech recognition 
processing and the database matching and incompleteness of 
15 the speech recognition accuracy Just as in a human operator 
based system, by using a dialogue control that primarily 
accounts for the naturalness in the dialogue with the user. 

In the speech recognition based interactive 
information retrieval method of this second scheme, because 
20 the retrieval target database is of large scale, the 
retrieval target database is maintained in a form of a 
plurality of statistically hlerarchized databases that are 
hierarchically structured according to the importance 
levels, and the number of data contained the statistically 
25 hlerarchized database at each level Is designed such rhat; 
the speech recognition and the retrieval key recognition 
likelihood calculation, and the speech recognition result 
table formation for the {n^l)-th level can be finished 
while the dialogue for determining the speech retrieval key 
30 according to the recognition result for the n-th level is 
carried out with the user. By utilizing differences in the 
processing times due to differences in the number of data 
contained at different levels, the speech recognition 
processing and the recognition candidate output are 
35 virtually realized within a prescribed period of time that 
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does not make the user to feel unnaturalness , 

Namely, the speech recognition processlngr for 
different levels of the staxistically hierarchlzed 
databases are carried out in parallel and the speech 
5 retrieval key candidates are extracted separately at each 
level. Then, utilizing the fact that the speech recognition 
processing for the highest level statistically hierarchlzed 
database that contains the smallest number of data 
representing the speech retrieval key candidates with the 
10 statistically high importances can be finished first, the 
speech recognition result table is sequentially referred 
starting from that of the highest level statistically 
hierarchlzed database, and a method for leading the 
dialogue with the user is deterniined according to the 
15 number of speech retrieval key leading candidates that 
m exceeds a prescribed likelihood threshold. In this way, the 

J dialogue between the user and the system can be made as 

yd 

rfi natural as the dialogue between human beings without making 

M the user conscious of incompleteness of the speech 

20 recognition accuracy, 
ri When the number of speech retrieval key leading 

U1 candidates is less than or equal to a prescribed number but 

not zero, the retrieval key determination related query for 
□ narrowing down the candidates from the leading candidates 

25 is carried out, and the speech retrieval key leading 
candidate which is found to be related to the obtained 
related attribute information candidates are determined as 
the speech retrieval key and presented to the user. 

When the number of the speech retrieval key leading 
30 candidates is greater than the prescribed number or zero, 
or when the speech retrieval key presented to the user 
above is negated by the user as not a correct one, or when 
no speech retrieval key leading candidate Is found to be 
related to the related attribute information candidates 
^35 obtained by the above described retrieval key determination 
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related query, there Is a posslblHty that the target 
speech retrieval key Is not contained In the highest level 
statistically hierarchized database, so that the retrieval 
target is shifted to the next level (lower level) 
5 statistically hierarchized database for which the speech 
recognition processing is already finished at this point. 
Here, however, the user remains unconscious of the shift of 
the fact that the retrieval target database to the lower 
level one, 

10 When the retrieval target database is shifted to the 

lower level one. if the speech retrieval key presented to 
the user above is negated by the user as not a correct one, 
or no speech retrieval key leading candidate is found to be 
related to the related attribute Information candidates 

15 obtained by the above described retrieval key determination 
related query, the related attribute information candidates 
already obtained by the retrieval key determination related 
query are utilized again, or If the number of the speech 
retrieval key leading candidates is greater than the 

20 prescribed number or zero, the retrieval key determination 
related query is newly carried out, and then the obtained 
related attribute information is utilized to carry out the 
cross-checking of the recognition likelihood for those 
candidates which are found to be related to the related 

25 attribute information candidates among the speech retrieval 
key candidates in this second level statistically 
hierarchized database that is the current recognition 
target, so as to determine a new recognition likelihood. 
Once again, the number of the speech retrieval key 

30 leading candidates is checked and If it is less than or 

equal to the prescribed number but not zero, the retrieval 
key determination related query for asking another related 
attribute Information is carried out, the speech retrieval 
key leading candidates in this second level statistically 

35 hierarchized database are narrowed down by utilizing the 
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newly obtained related attribute Information candidates 
additionally, and the speech retrieval key having the 
highest retrieval key recognition likelihood after the 
cross-checking of the recognition likelihoods is presented 
5 to the user similarly as in the case of the highest level 
statistically hlerarchized database. 

When the recognition result of the second level 
statistically hlerarchized database is such that the number 
of the speech retrieval key leading candidates is greater 
10 than the prescribed number or zero, or the speech retrieval 
key presented to the user above is negated by the user as 
not a correct one, or no speech retrieval key leading 
candidate is found to be related to the related attribute 
Information candidates obtained by the above described 
3 15 retrieval key determination related query, the retrieval 
3 target is shifted to the next level (third level) 

g Statistically hlerarchized database and the dialogue 

y leading is repeated similarly as in the case of the highest 

level statistically hlerarchized database, until the speech 
S 20 retrieval key is determined- 

In the dialogue leading in the case where the number 

3 

S of the speech retrieval key leading candidates is less than 

U or equal to the prescribed number but not zero at each 

i level, the reliability of the retrieval key recognition 

3 

1 25 likelihoods of the leading candidates is Increased by 

carrying out the retrieval key determination related query 
so as to narrow down the candidates effectively. In the 
dialogue leading in the case of shifting the retrieval 
target database to the lower level, the number of the 
30 speech recognition target words is greater in the lower 

level so that the degradation of the recognition accuracy 
can be expected, but by accounting for the relevancy with 
respect to all the related attribute information candidates 
obtained up until a timing for shifting the retrieval 
35 target to the lower level and narrowing down the candidates 
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using combination of more information, it Is possible to 
compensate the degradation of the recognition accuracy due 
to the increased number of data. 

Also, the speech recognition based interactive 
5 information retrieval method of this second scheme attempts 
the speech retrieval key determination usln& the related 
attribute information of the speech retrieval key, because 
the speech retrieval key determination at 100% accuracy is 
Impossible because the speech recognition accuracy is not 
10 100%, However, the related attribute information Is also 
obtained by carrying out the speech recognition with 
respect to a response to the retrieval key determination 
related query so that the related attribute information 
also cannot be obtained at 100% accuracy. 
3 15 For this reason, the recognition likelihoods of the 

i|p speech retrieval key candidates and the related attribute 

yjj information candidates are normalized and cross-checked in 

^™ order to compensate for incompleteness of the speech 

U recognition accuracy, and the dialogue control that 

20 primarily accounts for the naturalness is used while 

narrowing down the candidates by carrying out the retrieval 
!j| key determination related query, such that the speech 

;f]f retrieval key candidates are narrowed down without making 

^ the user conscious of incompleteness of the speech 

Q 25 recognition accuracy. 

By carrying out the dialogue with the user according 
to the dialogue control utilizing the hierarchical 
structure of the speech recognition database and the 
normalization and the cross-checking of the speech 
30 recognition likelihoods. It becomes possible to realize the 
interactive information retrieval that has both high 
accuracy and naturalness similar to the human operator 
based system, without making the user conscious of the 
waiting time and incompleteness of the speech recognition 
35 accuracy - 
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Next, in The third scheme of the present Invention, 
the narrowing down of the recognition target Is realized 
without determining the attribute value uniquely in the 
process for realizing the speech recognition processing and 
5 the retrieval key determination in real time, by urging the 
user to enter the attribute value of the attribute item of 
the retrieval key and narrowing dowxi the recognition target 
according to the entered attribute value. In view of the 
fact that the speech recognition database has the 
10 recognition target words that cannot be processed in real 
time. 

In this third scheme, similarly as in the conventional 
scheme, the retrieval key candidates are classified into 
groups each containing the number of words that can be 

15 processed in real time, by utilizing the attributes of the 
recognition target retrieval key candidates in the speech 
recognition database, and the recognition target is 
narrowed down by Inquiring the attribute of the requested 
retrieval key to the user in order to limit the recognition 

20 target group, so as to realize the speech recognition 
processing and the retrieval key determination in real 
time. At this point, the entered attribute value is not 
determined uniquely because the current speech recognition 
accuracy is not lOO^^s, so that the attribute value 

25 candidates are outputted in a descending order of the 

recognition likelihood obtained as a result of the speech 
recognition processing for the attribute value. 

In this third scheme, however, the confirmation 
process for uniquely determining the attribute value Is not 

30 carried out, and the attribute values that have the 
recognition likelihood greater than or equal to the 
prescribed likelihood threshold are set as the attribute 
value leading candidates, and all the retrieval key 
candidates belonging to the attribute value leading 

35 candidates are extracted from the speech recognition 



-23" 



00-05-26 19:30 JgMl 14049492499 IS 7C-MIY0SI & MIYOSI T-563 P. 30/86 U-221 



database as the recognition target. Namely, if the number 
of the attribute value leading candidates is n, the 
retrieval key candidates in n groups corresponding to the 
classification according to the attribute value leading 
5 candidates among the groups classified according to the 
attribute values will be extracted as the recognition 
target. Then, the user is urged to enter the speech input 
for the requested retrieval key, and the confirmation 
process for presenting the retrieval key candidates in a 
10 descending order of the recognition likelihood obtained by 
the speech recognition processing for the retrieval key 
using the retrieval key candidates as the retrieval target 
is carried out in an attempt to determine the retrieval key 
from the retrieval key candidates. 
15 In this way, the third scheme of the present invention 

% narrows down the recognition target from the large scale 

JJ speech recognition database, and does not carry out the 

confirmation process for determining the attribute value 
A uniquely in the process of initially requesting the user to 

20 enter the attribute value of the attribute Item of the 
'^H retrieval key, so that the confirmation process with 

M respect to the user is carried out only once for the 

retrieval key determination, and the circuity due to the 
5 repeated confirmation processes required in the 

3 25 conventional attribute value determination can be 

eliminated and furthermore the processing time can be 
shortened. 

Next, in the fourth scheme of the present invention, a 
recording medium that records the retrieval database to be 

30 used m determining the retrieval key at the retrieval 

apparatus in response to the user's input of the retrieval 
key Is formed in a two level hierarchical structure, where 
the higher level hierarchical data contain the number of 
data that can be recognition processed In real time as 

35 specified by the system. On the other hand, the lower level 
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hierarchical data are formed such that the retrieval key is 
contained, the number of data that cannot be recognition 
processed In real time are contained, each data contained 
in the lower level is always conceptually dependent on one 
5 data in the higher level, and the number of data in the 
lower level that are conceptually dependent on one data in 
the higher level is set to be the number of data that can 
be recognition processed in real time. In addition, an 
access frequency information indicating the bias of the 
10 access frequencies among the data in the lower level is 
provided and the data in the lower level are maintained 
such that a high frequency access data group and the other 
remaining data are distinguished according to the access 
frequency Information. 
Q 15 Also, this fourth scheme realizes the speech 

recognition based interactive information retrieval aiming 
at the determination of the entered retrieval key from the 
W speech recognition datajaase by carrying out the speech 

yj recognition processing for the retrieval key entered by the 

20 user as the speech input, as follows, 

L When the speech input for the requested retrieval key 

Q 

in is entered by the user, the recognition and retrieval 

y processing for the high frequency access data group is 

carried out at higher priority {procedure 1), and the 
Q 25 confirmation process for presenting the retrieval result 

candidates In a descending order of the recognition 
likelihood obtained as a result of the speech recognition 
processing for the retrieval key is carried out (procedure 
2) . If the retrieval key can be determined by the number of 
30 the confirmation processes less than or equal to a 

prescribed number in the procedure 2, the retrieval key is 
determined (procedure 3). 

If the confirmation processes of the prescribed number 
of times are negated by the user as not a correct retrieval 
35 key in the procedure 3, the related query for inquiring a 
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generic concept on which the requested retrieval key 
depends Is carried out usin& the higher level data as the 
reco^ition targ:et (procedure 4) . Then, the speech 
recogrnitlon for the user's response to the related query is 
5 carried out and, using the recognition likelihoods of the 
obtained generic concept candidates, the confirmation 
process for presenting the generic concept candidates in a 
descending order of the recognition likelihood is carried 
out until the generic concept is determined (procedure 5), 
10 When the generic concept is determined, the lower level 
data that depend on the determined higher level data are 
selectively extracted as the recognition target data 
(procedure 6). Then, the speech recognition processing for 
the retrieval key entered by the user is carried out again 
C3 15 and the confirmation process for presenting the obtained 
'i% retrieval key candidates in a descending order of the 

=;Q recognition likelihood is carried out so as to determine 

the speech retrieval key (procedure 7) . 
1,:^:; In this fourth scheme, the requested retrieval key is 

20 contained in the high frequency access data group, it is 
'^.^ possible to determine the retrieval key in real time using 

iJl only the input of the retrieval key that the user really 

wants to request ^ without carrying out the related query to 
n inquire a generic concept as assistant for narrowing down 

□ 25 the retrieval key so that the fast retrieval can be 

realized- Even when the requested retrieval key is not 
contained In the high frequency access data group, the user 
is urged to enter the retrieval key that the user really 
wants to request first, and then urged to enter a generic 
30 concept as assisting information^ which is natural unlike 
the conventional scheme in which the user is forced to 
start from the assisting query to inquire a generic concept 
in order to realize the effective narrowing down from a 
viewpoint of the system. It is also possible to determine 
35 the retrieval key entered by the user as the speech input 
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from the larg-e scale speech recognition database formed by 
data that cannot be processed In real time and that have 
the bias In the access frequencies, using the natural 
dialogue with the user in which the user Is ur&ed to enter 
5 the retrieval key that the user really wants to request 
first, without making the user conscious of the time 
required for the speech recognition processing and 
incompleteness of the speech recognition accuracy. 

Assuming that the speech recognition accuracy is 100^ 
10 and the candidate determination by the real time speech 

recognition processing takes Tl (sec), in the conventional 
scheme In which a generic concept for narrowing down the 
recognition target words Is Inquired first as the retrieval 
assist key rather than the retrieval key that the user 
Q 15 really wants to request, and the input of the retrieval key 

is urged after the generic concept is determined and the 
m specific concepts that are dependent on the generic concept 

are extracted as the retrieval target in order to realize 
it the recognition processing in real time, 2 x Tl (sec) will 

0 20 be required because the determination process is carried 

L out with the user twice for the generic concept (retrieval 

IM assist key) and the retrieval key, 

y On the other hand, in this fourth scheme in which the 

jS. high frequency access data group of the lower level is 

Q 25 formed by data having the access frequency of 80^^. the 

input of the retrieval key that the user really wants to 
request is urged first, and the retrieval processing is 
carried out at higher priority for the high frequency 
access data group, only Tl (sec) is required in the case 
30 where the requested retrieval key is contained in the high 
frequency access data group whereas 2 a Tl (sec) is 
required in the case where the requested retrieval key is 
not contained in the high frequency access data group 
because a method for narrowing down by inquiring the 
35 generic concept next is adopted, and therefore 0,8 x Tl 
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0-2 X 2 X Tl 1.2 X Tl (sec) is required ov€?rall, so that 
the expectation value for the time required In the 
retrieval key determination Is smaller In this fourth 
scheme, 

5 In practice, the speech recognition accuracy is not 

100% so that it Is difficult to complete the retrieval 
processing In the above processing time, but if the speech 
recognition device has such a recognition accuracy that the 
first candidate is a correct one at a probability of 50%, 
10 the second candidate Is a correct one at a probability of 
40%, and the third candidate is a correct one at a 
probability of 105s assuming that the correct retrieval key 
is obtained from the first three candidates when the 
correct retrieval key is contained In the speech 
Q 15 recognition database ► and assuming that the confirmation 

j2 process requires Tl (sec), the conventional scheme will 

•si I 

m require 0,5 ^Tl -1-0,4 x 2xT1 t 0.1 x 3xTl = 1.6 xTl 

(sec) (the confirmation process time in the case where the 
U second candidate Is a correct one is 2 x Tl (sec) because 

•^0 20 the confirmation process Is carried out twice). Then, after 

L narrowing down the recognition target to the number of 

i.fl words that can bo processed in real time using the generic 

■J concept, the determination of the retrieval key requested 

S by the user will also require 1. 6 x Tl (sec), so that 1.6 

□ 25 X Tl 1,6 X Tl (sec) will be required overall. 

On the other hand, in this fourth scheme, using the 
similar speech recognition accuracy and the high frequency 
access data group formed by data having the access 
frequency of S0%, and assuming that the confirmation 
30 process for the retrleva-1 key candidates obtained from the 
lower level Is carried out up to twice when the requested 
retrieval key is contained in the high frequency access 
data group, 0,8 x o.5 ^ti ^ 0.8 x o,4 x 2 ^ Tl = 1.04 xTl 
(sec) will be required for the retrieval key determination 
35 In the case where the correct retrieval key is obtained In 
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the first two candidates. Also, this case adopts a method 
for narrowing down the retrieval ran^e by inquiring: the 
generic concept when the correct retrieval key is not 
obtained in the first two candidates even if the correct 
5 retrieval key is contained in the high frequency access 
data group, so that 0.5 x Tl ^ 0,4 ^ 2 x ti x o.l x 3 x ti 
= 1.6 X Tl (sec) will be required in 10% of times (which is 
a probability by which the third candidate is the correct 
one), so that 1.6 x il x o,l - 0.16 x Tl will be required, 

10 Also, the same method is used when the requested retrieval 
key is not contained in the high frequency access data 
group so that 1,6 x ti (sec) will be required in 20% of 
times (In the case where the access frequency of the 
requested retrieval key Is less than 20^), so that 1,6 x Tl 

15 X Tl X 0.2 = 0.32 X Tl (sec) will be required. Thus, when 
the speech recognition accuracy is not 100%, this fourth 
scheme will require 1,04 x Tl + 0.16 x Tl + 0,32 x Tl = 
1,52 X Tl (sec) overall. 

Consequently, the expectation value for the time 

20 required in the retrieval key determination is reduced in 
this fourth scheme to less than a half compared with the 
conventional scheme. Moreover, this fourth scheme has the 
naturalness in that the user Is first urged to enter the 
retrieval key that the user really wants to request, rather 

25 than starting from an assisting query for the purpose of 
the effective narrowing down from a viewpoint of the 
system. 

According to one aspect of the present Invention there 
is provided a method of speech recognition based 

30 interactive information retrieval for ascertaining and 

retrieving a target Information of a user by determining a 
retrieval key entered by the user using a speech 
recognition processing, comprising the steps of; (a) 
storing retrieval key candidates that constitute a number 

35 of data that cannot be processed by the speech recognition 
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processing In a prescribed processing rime, as recognition 
target words in a speech recognition database, the 
recognition target words being divided into prioritized 
recognition target words that constitute a number of data 

5 that can be processed by the speech recognition processing 
In the prescribed processing time and that have relatively 
higher importance levels based on statistical information 
among the recognition target words, and non-prioritized 
recognition target words other than the prioritized 
10 recognition target words; (b) requesting the user by a 
speech dialogue with the user to enter a speech input 
indicating the retrieval key, and carrying out the speech 
recognition processing for the speech input with respect to 
the prioritized recognition target words to obtain a 

15 recognition result; (c) carrying out a confirmation process 
using a speech dialogue with the user according to the 
recognition result to determine the retrieval key, when the 
recognition result satisfies a prescribed condition for 
Judging that the retrieval key can be determined only by a 

20 confirmation process with the user; Id) carrying out a 

related information query using a speech dialogue with the 
user to request the user to enter another speech input for 
a related information of the retrieval key, when the 
recognition result does not satisfy the prescribed 

25 condition; (e) carrying out the speech recognition 

processing for the another speech input to obtain another 
recognition result, and adjusting the recognition result 
according to the another recognition result to obtain 
adjusted recognition result; and (f) repeating the step (c) 

30 or the steps (d) and (e) using the adjusted recognition 
result in place of the recognition result, until the 
retrieval key is determined. 

According to another aspect of the present Invention 
there Is provided a method of speech recognition based 

35 interactive information retrieval for ascertaining and 
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retrieving a Tar&er information of a user by deTerniinln& a 
retrieval key entered by the user using a speech 
recognition processing, comprising the steps of: (a) 
storing retrieval key candidates that are classified 
5 according to attribute values of an attribute item in a 
speech recognition database; (b) requesting the user by a 
speech dialogue with the user to enter a speech input 
indicating an attribute value of the attribute item for the 
retrieval key, and carrying out the speech recognition 
10 processing for the speech input to obtain a recognition 
result indicating attribute value candidates and their 
recognition likelihoods; (c) selecting those attribute 
value candidates which have recognition likelihoods that 
are exceeding a prescribed likelihood threshold as 
15 attribute value leading candidates, and extracting those 

retrieval key candidates that belong to the attribute value 
leading candidates as new recognition target data; (d) 
requesting the user by a speech dialogue with the user to 
enter another speech Input indicating the retrieval key, 
20 and carrying out the speech recognition processing for the 
another speech Input with respect to the new recognition 
target data to obtain another recognition result; and (ej 
carrying out a confirmation process using a speech dialogue 
with the user according to the another recognition result 
25 to determine the retrieval key- 

According to another aspect of the present invention 
there is provided a method of speech recognition based 
interactive information retrieval for ascertaining and 
retrieving a target information of a user by determining a 
30 retrieval key entered by the user using a speech 

recognition processing, comprising the steps of: (a) 
storing retrieval key candidates that constitute a number 
of data that cannot be processed by the speech recognition 
processing in a prescribed processing time, in a plurality 
35 Of statistically hlerarchlzed databases provided in a 
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speech recognition database, where lower level 
statistically hierarchlzed databases contain increasingly 
larger part of the retrieval key candidates such that a 
lowest level statistically hierarchiaied database contains 
5 all the retrieval key candidates; lb) requesting the user 
by a speech dialogue with the user to enter a speech input 
indicating the retrieval key, and carrying out the speech 
recognition processing for the speech input with respect to 
all of the plurality of statistically hierarchized 
10 databases In parallel, to sequentially obtain respective 
recognition results indicating recognition retrieval key 
candidates and their recognition likelihoods; (c) selecting 
those recognition retrieval key candidates which have 
recognition likelihoods that are exceeding a prescribed 
15 likelihood threshold as recognition retrieval key leading 
candidates, for each statistically hierarchized database 
for which the speech recognition processing is completed; 
and (d) controlling a next speech dialogue with the user 
according to whether a prescribed condition that a number 
20 of the recognition retrieval key leading candidates is less 
than or equal to a prescribed number but not zero is 
satisfied or not . 

According to another aspect of the present invention 
there is provided a speech recognition based interactive 
25 information retrieval apparatus for ascertaining and 

retrieving a target information of a user by determining a 
retrieval key entered by the user using a speech 
recognition processing, comprising: a speech recognition 
database configured to store retrieval key candidates that 
30 constitute a number of data that cannot be processed by the 
speech recognition processing In a prescribed processing 
time, as recognition target words, the recognition target 
words being divided into prioritized recognition target 
words that constitute a number of data that can be 
35 processed by the speech recognition processing in the 
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prescribed processing time and that have relatively higher 
Importance levels based on statistical information among 
the recognition target words, and non-prioritized 
recognition target words other than the prioritized 
5 recognition target words: a speech recognition unit 

configured to carry out the speech recognition processing; 
and a dialogue control unit configured to carry out speech 
dialogues with the user; wherein the dialogue control unit 
carries out a speech dialogue for requesting the user to 
10 enter a speech input indicating the retrieval key, such 
that the speech recognition unit carries out the speech 
recognition processing for the speech input with respect to 
the prioritized recognition target words to obtain a 
recognition result; the dialogue control unit carries out a 
'i 15 speech dialogue for a confirmation process according to the 
li recognition result to determine the retrieval key, when the 

recognition result satisfies a prescribed condition for 
=11 Judging that the retrieval key can be determined only by a 

confirmation process with the user; the dialogue control 
20 unit carries out a speech dialogue for a related 
p information query to request the user to enter another 

J| speech Input for a related information of the retrieval 

"5 key, when the recognition result does not satisfy the 

;3 prescribed condition, such that the speech recognition unit 

25 carries out the speech recognition processing for the 

another speech input to obtain another recognition result 
and the dialogue control unit adjusts the recognition 
result according to the another recognition result to 
obtain adjusted recognition result, and the dialogue 
30 control unit controls the speech dialogues to repeat the 

confirmation process or the related information query using 
the adjusted recognition result in place of the recognition 
result, until the retrieval key is determined- 
According to another aspect of the present invention 
35 there is provided a speech recognition based interactive 
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information retrieval apparatus for ascertaining and 
retrieving a target information of a aiser by determining: a 
retrieval key entered by the user using a speech 
recognition processing, comprising: a speech recognition 
5 database configured to store retrieval key candidates that 
are classified according to attribute values of an 
attribute item; a speech recognition unit configured to 
carry out the speech recognition processing; and a dialogue 
control unit configured to carry out speech dialogues with 
10 the user; wherein the dialogue control unit carries out a 
speech dialogue for requesting the user to enter a speech 
input indicating an attribute value of the attribute item 
for the retrieval key, such that the speech recognition 
unit carries out the speech recognition processing for the 
15 speech input to obtain a recognition result indicating 
attribute value candidates and their recognition 
likelihoods; the dialogue control unit selects those 
attribute value candidates which have recognition 
likelihoods that are exceeding a prescribed likelihood 
20 threshold as attribute value leading candidates, and 

extracts those retrieval key candidates that belong to the 
attribute value leading candidates as new recognition 
target data; the dialogue control unit carries out a speech 
dialogue for requesting the user to enter another speech 
25 input indicating the retrieval key, such that the speech 
recognition unit carries out the speech recognition 
processing for the another speech input with respect to the 
new recognition target data to obtain another recognition 
result; and the dialogue control unit carries out a speech 
30 dialogue for a confirmation process according to the 

another recognition result to determine the retrieval key. 

According to another aspect of the present invention 
there is provided a speech recognition based Interactive 
information retrieval apparatus for ascertaining and 
35 retrieving a target information of a user by determining a 
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retrieval key entered by the user using a speech 
recognition processing, comprising: a speech recognition 
database having a plurality of statistically hlerarchized 
databases configured to store retrieval key candidates that 
5 constitute a number of data that cannot be processed by the 
speech recognition processing in a prescribed processing 
time* where lower level statistically hierarchlzed 
databases contain increasingly larger part of the retrieval 
key candidates such that a lowest level statistically 

10 hlerarchlzed database contains all the retrieval key 

candidates; a speech recognition unit configured to carry 
out the speech recognition processing; and a dialogue 
control unit configured to carry out speech dialogues with 
the user; wherein the dialogue control unit carries out a 

15 speech dialogue for requesting the user to enter a speech 
input indicating the retrieval key, such that the speech 
recognition unit carries out the speech recognition 
processing for the speech input with respect to all of the 
plurality of statistically hlerarchlzed databases in 

20 parallel, to seguentially obtain respective recognition 

results indicating recognition retrieval key candidates and 
their recognition likelihoods; the dialogue control unit 
selects those recognition retrieval key candidates which 
have recognition likelihoods that are exceeding a 

25 prescribed likelihood threshold as recognition retrieval 

key leading candidates, for each statistically hlerarchlzed 
database for which the speech recognition processing is 
completed; and the dialogue control unit controls a next 
speech dialogue with the user according to whether a 

30 prescribed condition that a number of the recognition 

retrieval key leading candidates is less than or equal to a 
prescribed number but not zero is satisfied or not. 

According to another aspect of the present invention 
there is provided a computer usable medium having computer 

35 readable program codes embodied therein for causing a 
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computer to function as a speech recognition based 
Interactive information retrieval system for ascertaining 
and retrieving a target information of a user by 
determining a retrieval key entered by the user using a 
5 speech recognition processing and a speech recognition 
database for storing retrieval key candidates rhat 
constitute a number of data that cannot be processed by the 
speech recognition processing In a prescribed processing 
time, as recognition target words in a speech recognition 
10 database, the recognition target words being divided into 
prioritized recognition target words that constitute a 
number of data that can be processed by the speech 
recognition processing in the prescribed processing time 
which have relatively higher Importance levels based on 
15 statistical information among the recognition target words, 
m and non-prioritized recognition target words other than the 

Sj prioritized recognition target words, the computer readable 

program codes include; a first computer readable program 
M code for causing said computer to request the user by a 

20 speech dialogue with the user to enter a speech Input 
n Indicating the retrieval key, and carry out the speech 

jll recognition processing for the speech Input with respect to 

S the prioritized recognition target words to obtain a 

□ recognition result; a second computer readable program code 

^ 25 for causing said computer to carry out a confirmation 

process using a speech dialogue with the user according to 
the recognition result to determine the retrieval key, when 
the recognition result satisfies a prescribed condition for 
Judging that the retrieval key can be determined only by a 
30 confirmation process with the user; a third computer 

readable program code for causing said computer to carry 
out a related information query using a speech dialogue 
with the user to request the user to enter another speech 
Input for a related information of the retrieval key. when 
35 the recognition result does not satisfy the prescribed 
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condition; a fourth computer readable program code for 
causing said computer to carry out the speech recognition 
processing for the another speech input to obtain another 
recognition result, and adjust the recognition result 
5 according to the another recognition result to obtain 

adjusted recognition result; and a fifth computer readable 
program code for causing said computer to repeat processing 
of the second computer readable program code or the third 
and fourth computer readable program codes using the 
10 adjusted recognition result in place of the recognition 
result, until the retrieval key is determined. 

According to another aspect of the present invention 
there is provided a computer usable medium storing a data 
structure to be used as a speech recognition database In a 
15 speech recognition based Interactive information retrieval 
system for ascertaining and retrieving a target information 
of a user by determining a retrieval key entered by the 
user using a speech recognition processing, the data 
structure comprising: retrieval key candidates that 
20 constitute a number of data that cannot be processed by the 
speech recognition processing in a prescribed processing 
time, as recognition target words, the recognition target 
words being divided into prioritized recognition target 
words that constitute a number of data that can be 
25 processed by the speech recognition processing in the 
prescribed processing time which have relatively higher 
importance levels based on statistical Information among 
the recognition target words, and non-^prlorltlzed 
recognition target words other than the prioritized 
30 recognition target words. 

According to another aspect of the present invention 
there is provided a computer usable medium having computer 
readable program codes embodied therein for causing a 
computer to function as a speech recognition based 
35 interactive information retrieval system for ascertaining 



-37- 



00-05-26 19:34 5e5fe-001 14049492499 15 SffTn-MIYOSI & MIYOSI 



T-563 P. 44/86 U-221 



and retrlevlne: a target information of a user by 
determining a retrieval key entered by the user using a 
speech recognition processing and a speech recognition 
database for storing retrieval key candidates that are 
5 classified according to attribute values of an attribute 
Itera, the computer readable program codes include: a first 
computer readable program code for causing said computer to 
request the user by a speech dialogue with the user to 
enter a speech input indicating an attribute value of the 
10 attribute item for the retrieval key, and carry out the 
speech recognition processing for the speech Input to _ 
obtain a recognition result indicating attribute value 
candidates and their recognition likelihoods; a second 
computer readable program code for causing said computer to 
15 select those attribute value candidates which have 

recognition likelihoods that are exceeding a prescribed 
likelihood threshold as attribute value leading candidates, 
and extract those retrieval key candidates that belong to 
the attribute value leading candidates as new recognition 
20 target data; a third computer readable program code for 
causing said computer to request the user by a speech 
dialogue with the user to enter another speech inpur 
indicating the retrieval key, and carry out the speech 
recognition processing for the another speech Input with 
25 respect to the new recognition target data to obtain 

another recognition result; and a fourth computer readable 
program code for causing said computer to carry out a 
confirmation process using a speech dialogue with the user 
according to the another recognition result to determine 
30 the retrieval key. 

Accordlng to another aspect of the present invention 
there is provided a computer usable medium having computer 
readable program codes embodied therein for causing a 
computer to function as a speech recognition based 
35 Interactive information retrieval system for ascertaining 
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and retrieving a target Inrorinatlon of a user by 
determining a retrieval key entered by the user using a 
speech recognition processing and a speech recognition 
database having a plurality of statistically hlerarchized 
5 databases for storing retrieval key candidates that 

constitute a number of data that cannot be processed by the 
speech recognition processing in a prescribed processing 
time, where lower level statistically hlerarchized 
databases contain Increasingly larger part of the retrieval 
10 key candidates such that a lowest level statistically 
hlerarchized database contains all the retrieval key 
candidates, the computer readable program codes Include: a 
first computer readable program code for causing said 
computer to request the user by a speech dialogue with the 
J 15 user to enter a speech input indicating the retrieval key, 

5 and carry out the speech recognition processing for the 

:g speech input with respect to all of the plurality of 

aJ statistically hlerarchized databases in parallel, to 

;J sequentially obtain respective recognition results 

3 20 indicating recognition retrieval key candidates and their 

recognition likelihoods; a second computer readable program 
code for causing said computer to select those recognition 
y retrieval key candidates which have recognition likelihoods 

^ that are exceeding a prescribed likelihood threshold as 

;3 25 recognition retrieval key leading candidates, for each 

statistically hlerarchized database for which the speech 
recognition processing is completed; and a third computer 
readable program code for causing said computer to control 
a next speech dialogue with the user according to whether a 
30 prescribed condition that a number of the recognition 

retrieval key leading candidates is less than or equal to a 
prescribed number but not zero is satisfied or not . 

Other features and advantages of the present invention 
will become apparent from the following description taken 
35 in conjunction with the accompanying drawings. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

5 Fiff- 1 is a block diagram showing an exemplary 

configuration of a speech recognition based interactive 
Information retrieval apparatus In the first embodiment of 
the present invention. 

Fig. 2 Is a diagram showing an exemplary information 
10 database to be utilized in the speech recognition based 
interactive information retrieval apparatus of Flg^ 1. 

Fig- 3 is a flow chart for an Information 
determination processing procedure in the speech 
recognition based interactive information retrieval 

15 apparatus of Fig. 1- 

Flg, 4 is a diagram showing an exemplary Information 
database in a concrete example for an interactive 
information retrieval method in the first embodiment of the 
present Invention . 

20 Fig. 5 is a diagram showing an exemplary recognition 

result with respect to prioritized recognition target words 
in a concrete example for an interactive Information 
retrieval method in the first embodiment of the present 
invention, 

25 Fig, 6 is a diagram showing an exemplary recognition 

result for a related attribute (prefecture) in a concrete 
example of an Interactive Information retrieval method in 
the first embodiment of the present Invention, 

Fig. 7 is a diagram showing an exemplary result of 
30 adding a recognition result with respect to non-prlorltlzed 
recognition target words in a concrete example of an 
interactive information retrieval method in the first 
embodiment of the present Invention. 

Fig. 8 is a diagram showing an exemplary cross- 
35 checking of attribute value candidates and related 
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Information in a concrete example of an Interactive 
Information retrieval method In the first embodiment of the 
present invention , 

F1&. 9 is a block diagram showing an exemplary 
5 configuration of a speech recognition based interactive 

information retrieval apparatus in the second embodiment of 
the present invention. 

Fig. 10 is a diagram showing an example of 
statistically hlerarchlzed databases to be utilised In the 

10 speech recognition based interactive information retrieval 
apparatus of Fig- 9. 

Fig, 11 Is an exemplary speech recognition result 
table with calculated recognition likelihoods with respect 
to speech retrieval key candidates that is to be utilized 

15 in the speech recognition based interactive information 
retrieval apparatus of Fig, 9. 

Fig. 12 is a diagram showing an exemplary retrieval 
key attribute database to be utilised in the speech 
recognition based Interactive information retrieval 

20 apparatus of Fig. 9. 

Fig, 13 is a diagram showing an exemplary related 
information recognition result table Indicating a speech 
recognition result for a user's response to a retrieval key 
determination related query that Is utilized in the speech 

25 recognition based interactive information retrieval 
apparatus of Fig. 9, 

Fig- 14 is a flow chart for a processing procedure of 
a dialogue control vinit in the speech recognition based 
interactive information retrieval apparatus of Fig. 9, 

30 Fig. 15 is a diagram showing an example of 

statistically hierarchlal databases for speech recognition 
in a concert ticket reservation system which is a concrete 
example of an interactive information retrieval method in 
the second embodlmenr of the present invention. 

35 Fig. 16 is a diagram showing an exemplary speech 
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recognition result table with respect to a first level 
statistically hlerarchlzed database In a concert ticket 
reservation system which is a concrete example of an 
interactive information retrieval method in the second 
5 embodiment of the present invention. 

Fig. 17 is a diagram showing an exemplary retrieval 
key attribute database in a concert ticket reservation 
system which is a concrete example of an interactive 
information retrieval method In the second embodiment of 
10 the present invention. 

Fig", 18 is a diagram showing an exemplary related - 
information recognition result table obtained from a 
response to a retrieval key determination related query for 
inquiring a concert date in a concert ticket reservation 
15 system which is a concrete example of an interactive 

information retrieval method in the second embodiment of 
the present invention. 

Fig, 19 is a diagram showing an exemplary speech 
recognition result with respect to a second level 
20 statistically hierarchized database in a concert ticket 
reservation system which is a concrete example of an 
interactive information retrieval method in the second 
embodiment of the present Invention, 

Fig, 20 is a diagram showing an exemplary cross- 
25 checking of a second level statistically hierarchized 

database and a related information recognition result table 
for a concert data in a concert ticket reservation system 
which is a concrete example of an interactive information 
retrieval method in the second embodiment of the present 
30 invention- 
Fig- 21 is a diagram showing an exemplary related 
information recognition result table obtained from a 
response to a retrieval key determination related query for 
Inquiring a place of a concert in a concert ticket 
35 reservation system which is a concrete example of an 
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interactive Information retrieval method in the second 
embodiment of the present invention. 

Flff. 22 is a diagram showing an exemplary cross- 
checxlnff of speech retrieval key leading candidates in a 
5 second level statistically hierarchized database and a 
concert date and a place of a concert in a concert ticket 
reservation system which is a concrete example of an 
interactive information retrieval method in the second 
embodiment of the present invention. 
10 Fig. 23 is a block diagram showing an exemplary 

configuration of a speech recognition based interactive 
information retrieval apparatus in the third embodiment of 
the present invention. 

Fig. 24 is a diagram showing an exemplary speech 
□ 15 recognition database to be utilized in the speech 

recognition based interactive information retrieval 
i:g apparatus of Fig. 23. 

•^1 Fig. 25 Is a diagram showing an exemplary attribute 

JT database to be utilized In the speech recognition based 

i=P 20 interactive information retrieval apparatus of Fig. 23. 

!L; Fig. 26 is a flow chart for a retrieval key 

yi determination processing procedure in the speech 

recognition based interactive information retrieval 
!5 apparatus of Fig- 23, 

Q 25 Fig. 27 is a diagram showing an exemplary speech 

recognition database in a city/town determination system 
which is a concrete example of an interactive information 
retrieval method in the third embodiment of the present 
invention , 

30 Fig- 28 is a diagram showing an exemplary attribute 

database in a city/town determination system which is a 
concrete example of an interactive information retrieval 
method in the third embodiment of the present invention. 

Fig, 29 is a diagram showing an exemplary recognition 

35 result for an attribute value in a city/town determination 
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system which Is a concrete example of an interactive 
information retrieval method in the third embodiment of the 
present invention. 

Fig- 30 is a diagram showing an exemplary result of 
5 narrowing down a recognition target in a city/town 

determination system which is a concrete example of an 
Interactive information retrieval method in the third 
embodiment of the present invention- 

Fig. 31 is a diagram showing an exemplary recognition 
10 result for a retrieval key in a city/town determination 
system which is a concrete example of an interactive 
information retrieval method in the third embodiment of the 
present invention- 
Fig, 32 is a block diagram showing an exemplary 
15 configuration of a speech recognition based interactive 
iS information retrieval apparatus in the fourth embodiment of 

n the present invention, 

j : I 

Fig- 33 is a diagram showing an exemplary speech 
recognition database to be utilized in the speech 
20 recognition based interactive information retrieval 
Q apparatus of Fig. 32- 

'il Fig. 34 is a flow chart for an interactive information 

retrieval processing procedure in the speech recognition 
Q based interactive information retrievai apparatus of Fig. 

W 25 32, 

Fig- 35 is a diagram showing an exemplary speech 
recognition database in a city/town determination system 
which is a concrete example of an Interactive information 
retrieval method in the fourth embodiment of the present 

30 invention. 

Fig- 36 is a diagram showing an exemplary high 
frequency access data group in a city/town determination 
system which is a concrete example of an interactive 
information retrieval method in the fourth embodiment of 

35 the present invention. 
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Fig, 37 Is a diagram showing an exemplary speech 
retrieval key recognition result in the case of determining 
"Yokohama" In a city/town determination system which Is a 
concrete example of an Interactive information retrieval 
5 method in the fourth embodiment of the present invention. 
Fig, 38 is a diagram showing an exemplary speech 
retrieval key recognition result in the case of determining 
''Yokokawa" using a high frequency access data group as a 
recognition target in a city/town determination system 
10 which is a concrete example of an Interactive information 
retrieval method in the fourth embodiment of the present 
invention , 

Fig- 39 is a diagram showing an exemplary speech 
retrieval key recognition result in the case of determining 
15 "Yokokawa" using cities or towns in Gunma as a recognition 
.n target In a city/town determination system which is a 

concrete example of an interactive information retrieval 
method in the fourth embodiment of the present invention. 



m 



20 

=□ DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

ii \ 

fy 

□ Referring now to Fig, 1 to Fig, 8, the first 

embodiment directed to the above described first scheme of 
25 the present Invention will be described in detail. 

Fig, 1 shows an exemplary configuration of a speech 
recognition based interactive Information retrieval 
apparatus (which will also be referred to as Interactive 
information retrieval apparatus for short) In the first 
30 embodiment of the present invention. This interactive 

information retrieval apparatus 1 comprises a speech input 
unit 2, a speech identification unit 3, a dialogue control 
unit 4, and a speech output unit 5, The speech 
identification unit 3 further comprises a speech 
35 recognition unit 3-1 and a speech recognition result output 
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unit 3-2- The dialogue control unit 4 further comprises a 
result adjustment unit 4-1, a dialogue leading unit 4-2 and 
a query and response generation unit 4-3. The speech 
Identification unit 3 utilises a speech recognition device 
5 6, and the speech output unit 5 utilizes a speech output 
device 8. Also, the speech recognition processing for input 
speech at the speech identification unit 3 and the result 
adjustment unit 4-1 and the dialogue leading unit 4-2 of 
the dialogue control unit 4 utilize a system database 7. 
10 The system database 7 comprises an information database 7-1 
that records target information Intended by users, and a 
YES/NO type template database 7-2, 

Fig, 2 shows an exemplary overview of the information 
database 7-1, which contains a plurality of attributes ana 
1^ 15 their attribute values in a form of a set of attribute 

m databases for respective attributes, where different 

'fi attributes may have different numbers of attribute values- 

fy The attributes are hierarchically related with each other. 

^;J; The interactive information retrieval appararus i defines 

20 importance levels according to statistical information such 
Q as access frequencies with respect to attribute value 

candidates of each attribute, and selects a prescribed 
Q number of attribute values that are expected to be capable 

of being speech recognition processed within a real 
25 dialogue processing time in an order of the Importance 
levels as prioritized recognition target words. The 
remaining non-prioritieed recognition target words are 
recorded in subdivisions in units of the nunjber of words 
that is specified by the system in view of carrying out the 
30 recognition processing in parallel to the dialogue with the 
user, such as the number that can be processed by the 
speech recognition processing in a real dialogue processing 
time or the number that can be processed by the speech 
recognition processing in a real related Information query 
35 dialogue time, in an order of the importance levels. 
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Note that the real dialogue processing time Is defined 
by the system as a time to be taken by the speech dialogue 
with the user that Is expected not to cause any stress on 
the user and not to make the user conscious of any 
5 unnaturalness - 

This embodiment will describe the case In which the 
Interactive information retrieval apparatus 1 Inquires to 
the user about an attribute that has the number of 
attribute values exceeding the number rhar can be processed 

10 in a real dialogue processing time and that can enable the 
target information determination efficiently by accounting 
for the user's preference » among the attributes that 
constitute the target information - 

Fig. 3 shows a processing procedure for the target 

15 information determination by the interactive information 
retrieval apparatus 1 of this embodiment. 

First, when the user selects an attribute of the 
target information to be requested (step SI), the 
interactive information retrieval apparatus 1 requests the 

20 user to enter an attribute value of that attribute (step 

S2) , and when an attribute value of the requested attribute 
is entered by the user at the speech input unit 2, the 
input speech is sent to the speech identification unit 3 
where the priority recognition processing for the received 

25 user input is carried out at the speech recognition unit 3- 
1 using the speech recognition device 6 (step S3), Here, 
the speech recognition device 6 selects a database to be 
used as the recognition target from the system database 7 
according to a stage of the processing by the interactive 

30 information retrieval apparatus 1, Namely, the Information 
database 7-1 is selected for an attribute value input or a 
response to a related information query, and the YES/NO 
type template database 7-2 is selected for a user response 
in the confirmation process. Also, when the information 

35 database 7-1 is referred, the recognition processing using 
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attribute values of the attribute that Is a target of the 
query In the database as the recognition target words Is 
carried out. 

The speech recognition unit 3-1 carries out the 
5 recognition processing for the attribute values specified 
as the prioritized recognition target words of the 
requested attribute in the information database 7-1- The 
speech recognition result output unit 3-2 obtains the 
recognition result and sends it to the dialogue control 
10 unit 4- 

The result adjustment unit 4-1 of the dialogue control 
unit 4 holds the recognition result for the prioritized 
recognition target words and sends it to the dialogue 
leading unit 4-2, The dialogue leading unit 4-2 Judges 
IS whether the received recognition result satisfies a 

prescribed condition defined in terms of the recognition 
likelihood for judging that the attribute value can be 
determined only by the confirmation process with the user, 
or not (step S4), and when this condition is satisfied, the 
20 dialogue leading unit 4-2 commands the query and response 
generation unit 4-3 to carry out the confirmation process. 
The query and response generation unit 4-3 then generates a 
query message for the confirmation process and sends it to 
the speech output unit and the speech output unit 5 
25 outputs the query message for the confirmation process 
while presenting candidates to the user, and requests a 
response to the confirmation query (step S5). 

The speech input unit 2 receives a response of the 
user to the confirmation query and sends it to the speech 
30 identification unit 3, and the speech recognition unit 3-1 
recognizes the user response by using the YES/NO type 
template database 7-2 as the recognition target, and sends 
the recognition result to the dialogue control unit 4 (step 
S6) . 

35 The result adjustment unit 4-1 sends the received 
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recognition result to tHe dialogue leading unit 4-2, and 
the dialogue leading unit 4-2 Judges whether the user 
response indicates affirmation or not (step ST). When the 
response Indicating affirmation is obtained, the dialogue 
5 leading unit 4-2 commands the qeury and response generation 
unit 4-3 to generate a response message to notify the 
attribute value determination success , and this response 
message is outputted from the speech output unit 5 and the 
attribute value is determined (step S8) . If there is 
10 another attribute which must be determined in order to 

ascertain the target information, the similar processing is 
repeated and then the target information is ascertained- 

On the contrary, when the response indicating negation 
is obtained with respect to the confirmation query (step S7 
15 NO), or when the prescribed condition for judging that the 
attribute value can be determined only by the confirmation 
process with the user is not satisfied {step S4 NO), the 
dialogue leading unit 4-2 determines to carry out the 
related information query, and selects an attribute to be 
20 inquired as the related information from the information 
database 7-1 in the system database 7 (step S9), The query 
and response generation unit 4-3 generates a query message 
for inquiring the selected related Information and sneds it 
to the speech output unit 5, so as to request the user to 
25 enter an attribute value (step SIO). 

When It is determined to carry out the related 
information query, the dialogue leading unit 4-2 also 
commands the speech identification unit 3 to start the 
recognition processing for the sets of the remaining non- 
30 prioritized recognition target words that are subdivided in 
units of the number specified by the system, and the speech 
recognition unit 3-1 starts the recognition processing for 
each set of the non-prioritized recognition target words 
(step Sll). The speech recognition result output unit 3-2 
35 sends the recognition result for each set of the non- 
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prioritized recognition targret words \*henever it is 
obtained, to the dialogue control unit 4, where it is added 
to the recognition result for the prioritized recogrnitlon 
target words that Is held at the result adjustment unit 4- 
5 1, 

While the recognition processing for the non- 
prioritized recognition target words is in progress inside 
the interactive information retrieval apparatus I, the 
query message to Inquire the related information is 
10 outputted from the speech output unit 5 to the user. The 

speech input unit 2 receives a user response to the related 
information query and sends it to the speech identification 
unit 3 which then carries out the priority recognition 
processlng for this user response (step S12). 
=5 15 The prescribed number of attribute values that 

constitutes one set of the non-prioritized recognition 
Ld target words is defined such that the recognition 

fU processing is already finished at least for the first one 

[pj set (comprising the prescribed number of attribute values) 

20 at this point, 
^ The speech identification unit 3 checks the progress 

ill of the related information query whenever the recognition 

Q processing for one set is finshed during the recognition 

y processing for The non-prioritized recognition target 

25 words- When the dialogue for the related information query 
is continuing, the recognition result for the set of the 
non-prioritized recognition target words Is sent to the 
dialogue control unit 4. and added to the recognition 
result for those attribute values for which the recognition 
30 has been completed so far that is held in the result 

adjustment unit 4-1- Here, the recognition processing and 
the adding of the recognition result are carried out tor as 
many sets of the non-prioritized recognition target words 
as possible until the response to the related information 
35 query Is sent from the speech input unit 2, 
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When the user response to the related information 
query is received at the speech Identification unit 3, the 
speech recognition unit 3-1 starts the recognition 
processing: for the related Information (attribute value) as 
S the recognition target using the information database 7-1 
of the system database 7. The speech recognition result 
output unit 3-2 sends the recognition result for the 
response to the related information query to the dialogue 
control unit 4. 

10 The result adjustment unit 4-1 of the dialogue control 

unit 4 cross-checks the received recognition result for the 
related information and the recognition result for the 
attribute values to which the recognition results obtained 
up to that point have been added (step Sl3) , At a time of 

15 cross-checking, the likelihood of each attribute value 

candidate to be a correct one is re-calculated by applying 
suitable operation on the recognition likelihood of each 
attribute value candidate. 

The dialogue leading unit 4-2 Judges whether the 

20 prescribed condition for Judging that the attribute value 
can be determined only by the confirmation process with the 
user or not according to the re-calculated likelihood (step 
S14), and commands the query and response generation unit 
4-3 to carry out the candidate presentation and the 

25 confirmation query (step SE) or the further related 

information query (step S9) depending on the judgement 
result- When the presentation of the cross-checked result 
is negated, the related information query is also carried 
out. During the recognition processing for the response to 

30 the related information query, the recognition processing 
for the set of the non-prioritized recognition target words 
is suspended. 

Also if there is a remaining set of the non- 
prioritized recognition target words that has not yet 

35 recognition processed^ the recognition processing and the 
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result adding for the remaining set Is continued when It is 
determined to carry out the related Information query . 
Here, however, at a time of cross-checking the recognition 
result for the non-priorltl2ed recognition target words at 
5 the result adjustment unit 4-1 of the dialogue control unit 
4, if there exists the related Information that has already 
been obtained by the past related information query, the 
recog-nition result for the attribute value candidates is 
added after cross-checking with the already obtained 

10 related information is done. 

By repeating this series of operations untiJ the 
attribute value can be determined, the target Information 
Is ascertained. 

In the following, the Interactive information 

15 retrieval method of this embodiment will be described for a 
concrete example. Here, the case of applying the 
interactive information retrieval method of this embodiment 
to an input Interface for "address determination system" 
will be described. In this example * the target information 

20 is an address {In Japan), 

The number of address candidates for all of Japan 
exceeds the number that can be processed In the real 
dialogue processing time, so that the information database 
to be utilized in the address determination is 

25 hierarchically structured such that the prefectures (47 
attribute values), cities or towns in the preferctures 
(4,100 attribute values), and sections in the cities or 
towns (180.000 attribute values) are used as the attributes 
constituting the address, by setting the prefectures at the 

30 highest level, the cities or towns at the next level, and 
the sections at the lowest level. An example of the 
information database to be utilized in the address 
determination Is shown in Fig, 4. 

The current speech recognition technology is such that 

35 It is impossible to complete the recognition processing for 
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4,100 candidates for the cities or towns and 180,000 
candidates for the sections In the real dialogue processing 
time. For this reason, the conventional method has no 
choice but adopting a method In which the prefetcture is 
5 Inquired first, the confirmation is repeated until the 
prefecture is determined, then the recognition target is 
limited to the cities or towns in that prefecture and the 
city or town is inquired and determined next- However, from 
a viewpoint of the user, to be sequentially inquired from 

10 the name of the prefecture is circuitous, and in the case 
of specifying up to the section, it is necessary to carry 
out the input requests at least three times for the 
prefecture, the city or town, and the section, as well as 
the repetition of the confirmation process until each input 

15 is determined - 

In this example, the case of specifying up to the city 
or town of the address will be considered. The interactive 
information retrieval apparatus defines the importance 
levels with respect to the cities or towns according to 

20 their past access frequencies, their sizes (populations), 
etc, and selects top 100 cities or towns that are expected 
to be capable of being processed in the real dialogue 
processing time as the priority-recognition target words. 
Then, the input of the name of the city or town is 

25 requested to the user. According to the recognition result 
for the city or town, whether the city or town can be 
determined only by the confirmation process with the user 
or not is judged- In this example, this Judgement is made 
according to the number of retrieval key candidates that 

30 have the recognition likelihood greater than a prescribed 
threshold which is obtained by comparing the recognition 
likelihood and the prescribed threshold. When the number of 
the retrieval key candidates that have the recognition 
likelihood greater than the prescribed threshold is less 

35 than or equal to 2 but not 0, it is Judged that the 
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retrieval key can be determined only by the confirmation 
process so that the confirmation process by presenting the 
candidates Is carried out* When the number of candidates 
that have the recognition likelihood greater than the 
5 prescribed threshold is 0 or greater than 2» the related 
Information query is carried out. 

The remaining 4,000 non-prioritized recognition target 
words are subdivided into 8 sets of 500 each, In an order 
of the Inportance levels according to the specified 
10 dialogue time required for the related Information query. 
In this example, the recognition processing and the result 
adding are carried out by utilizing the dialogue time 
during which the retrieval key determination related query 
Is carried out. Here, it Is possible to expect that the 
q 15 recognition processing for 2,000 candidates {4 sets) can be 

completed in one related information query dialogue time. 
fn Now, the case of ascertaining the user input 

UJ "Chigasakl, Kanagawa" will be described. The user enters 

the name of the city "Chigasakl" of the address that the 
j.n 20 user wants to request. Assuming that the Importance level 

of Chigasakl Is 500-th from the top, Chigasakl Is not 
contained In the prioritized recognition target words, 
jjj When the speech retrieval key of "Chigasakl" is 

W entered from the speech Input unit 2, the speech 

'pi 25 recognition unit 3-1 of the speech identification unit 3 

carries out the speech recognition processing with respect 
to the 100 prioritized recognition target words (cities or 
towns) in the information database 7-1, 

The speech recognition result output unit 3-2 sends 
30 the reco^ltion result for the prioritized recognition 

target words to the dialogue control unit 4. An example of 
the recognition result Is shown in Fig, 5. The result 
adjustment unit 4-1 holds this recognition result and sends 
it to the dialogue leading unit 4-2- The dialogue leading 
35 unit 4-2 compares the calculated recognition likelihood 
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with the prescribed threshold for the 100 cities or towns 
that are the prioritized recogrnltion target words. In this 
example, the prescribed threshold is assumed to be 1,000. 
As can be seen from Fig. 5, therte is no city or town 
5 candidates that have the recognition likelihood greater 
than the prescribed threshold in this case. 

Consequently, the dialogrue leading unit 4-2 determines 
to carry out the related information query, and selects the 
attribute to be utilized as the related information from 

10 the information database 7-1- In this example, the 

hierarchically adjacent prefecture is selected as the 
attribute. When it is determined to carry out the related 
information query, the speech recognition unit 3-1 starts 
the recognition processing for the remaining tion- 

15 prioritized recognition target words. Here, the recognition 
processing is carried out for each set of 500 cities or 
towns that are the non-prioritized recognition target 
words. The speech recognition result output unit 3-2 sends 
the recongitlon result for each set of 500 cities or towns 

20 to the result adjustment unit 4-1 of the dialogue control 
unit 4, and adds it to the recognition result for the 100 
cities or towns that are the prioritized recognition target 
words. In this example, the name of the prefecture is 
inquired as the related information query » and the 

25 recognition processing for 2,000 candidates (4 sets) are 
expected to be completed until the user's response 
"Kanagawa" is entered • An exemplary result obtained by 
adding the recognition result for 4 sets of the non- 
prlorltlzed recognition target words is shown in Fig* 6. 

30 The dialogue leading unit 4-2 then commands the query 

and response generation unit 4-3 to generate the related 
information query for inquiring the name of the prefecture, 
and the query message is outputted from the speech output 
unit 5, 

35 When the user's response ''Kanagawa" is entered from 
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the speech Input unit 2, the recognition processing for the 
non-pr ioritlzed recognition target words is suspended. In 
the speech identification unit 3, the entered prefecture is 
recognized at the speech recog-nitlon unit 3-1 and the 
5 result is sent from the speech recogrnltlon result output 
unit 3-2 to the result adjustment unit 4-1 of the dialo&ue 
control unit 4. An example of the recog-nltion result for 
the prefecture is shown In Fig. 7» 

At this point, the result adjustment unit 4-1 holds 

10 the result for 2,100 cities or towns (100 prioritized 
recognition target words 2,000 non-prlorltlzed 
recognition target words that are recognition processed 
during the related information query dialogue time) for 
which the recognition processing has been completed so far 

15 {Fig, 6), 

I The result adjustment unit 4-1 refers to the 

] information database 7-1. and cross-checks the recognition 

I 

results for the city or town candidates and the prefecture 
] candidates- In this example, the cross-checking processing 

J 20 is defined to be a multiplication of the recognition 

likelihoods of the related attribute values. In other 

i words, for each city or town candidate, the prefecture to 

I 

; which this city or town candidate belongs Is judged by 

I referring to the Information database 7-1, and the 

I 25 recognition likelihood of this city or town candidate is 

' multiplied by the recognition likelihood of the belonging 

prefecture. The multiplication result is then held as a new 
recognition likelihood* An exemplary result of the cross- 
checking is shown in Fig. 8, 
30 As can be seen from the result of the multiplication 

shown in Fig, 8, the top two retrieval key candidates 
"Chigasakl, Kanagawa" and "Takamatsu, Kagawa" have the 
cross-checked likelihood greater than the threshold 
(1,000,000), The dialogue leadlxxg unit 4-2 determines to 
35 carry out the confirmation process by presenting these two 
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candidates sequentially, and commands the query and 
response generation unit 4-3 to generate the confirmation 
query message. When the response to the presentation of 
"Chigasakl, Kanagawa'' outputted from the speech output unit 
5 5 Is entered from the speech input unit 2, the speech 

identification unit 3 carres out the recognition processing 
using the YES/NO type template database 7-2 as the 
recognition target. As a result of the recognition, the 
response Indicating affirmation is obtained so that the 
10 dialogue leading unit 4-2 judges that the target city or 
town is determined as "Chigasakl", and outputs a 
notification of this fact from the speech output unit 5, 
Here, the prefecture can be derived automatically from the 
city or town according to the relations among the 
15 attributes in the information database 7-1, so that the 
target address is ascertained at this point. 

According to the first scheme of the present invention 
described in this embodiment, the importance levels are 
defined with respect to the attribute values in the number 
20 exceeding the number that can be processed in the real 

dialogue processing time, and the attribute values with the 
higher imporatance levels in the number than can be 
processed in the real dialogue processing time are selected 



Ul 

Q and the priority recognition processing for them is carried 

25 out. In this way, the number of the recognition target 

{'1 

words can be seemingly narrowed down so that there is no 
need to keep the user awaiting, and moreover, the 
recognition result having a tolerable level of accuracy for 
the user can be expected as the recognition target words 

30 are narrowed down. 

In addition, in the case where the importance levels 
are defined according to the past access frequencies, the 
possibility for the user's input to be the attribute value 
with the high importance level becomes higher when the 

3S access frequencies have the larger bias. Consequently, in 
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the concrete example described above, for example. In 
contrast to the conventional method In which it is only 
possible to determine the prefecture and then the city or 
town In this order, the user Is allowed to enter the city 
5 or town from the beginning, and the hl&her level prefecture 
can also be determined once the city or town is determined, 
so that it becomes possible to finish the retrieval 
processing only by the input of the city or town. In this 
way* it is possible to expect the redaction of the number 

10 of user utterances and the shortening: of the overall 
dialogue time. 

Even when the user input is the non-prioritized 
recognition target word, the recognition processing for the 
non-prioritized recognition target words is carried out by 

15 utilizing the related information query dialogue time, the 
obtained recongition result Is added to the already 
obtained recognition result, and the attrlbiute value 
candidates are narrowed down according to the relevancy 
with respect to the obtained related information, so that 

20 it becomes possible to carry out the recognition processing 
for the attribute values in the number exceeding the number 
than can be processed in the real dialogue processing time 
and to compensate for Incompleteness of the speech 
recognition accuracy without making the user conscious of 

25 it. In contrast to the conventional method in which the 
confirmation process Is repeated until the correct one is 
determined, the related information query is carried out so 
that it appears that the attribute value is determined 
through the natural dialogues from a viewpoint of the user, 

30 and it also becomes possible to allow the user to 

immediately enter the attribute value that seems to be more 
suitable for ascertaining the target information 
efficiently from a viewpoint of the user (the attribute 
value that is more in accord with the user preference). 

35 In the concrete example described above, the case of 
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determlniai? rhe address up to the city or town has been 
described, bur In the case of speclfyiniT up to the sGctlon, 
It Is possible to determine the section from 180,000 
section candidates by carrying out the similar dialo&ae 
5 processing using the prefecture and the city or town as the 
related Information and the sections as the recoOTtion 
tarhet attribute values. 

In addition, it is also possible to use the speech 
input of the attribute values for plural attributes by 
10 selecting: the prioritized recognition target words over 
plural attributes (levels) from the entire information 
database, without limiting to a specific attribute- In this 
case, by defining the importance levels with respect to all 
of the prefectures, the cities or towns and the sections, 
15 and selecting the prioritized recognition target words from 
i.y all levels, it becomes possible to determine the input 

attribute value of any level, without specifying the 
hi attribute to be entered first by the user from the system 

side. By not specifying the attribute to be entered first 
20 by the user from the system side, it becomes possible to 
realize the interactive information retrieval that Is even 
::: more In accord with the user preference. 

Ly Note that the address determination of the concrete 

■3 example described above can be utilized for an address 

|5 25 input in the product delivery, the telephone number search, 

ot the postal code search, and the interactive information 
retrieval method of this embodiment is easily applicable to 
the ticker reservation, the target location search by an 
automotive global positioning systems, and the station 
30 search- In addition, this interactive Information retrieval 
method is also applicable to the name search by providing a 
plurality of attributes such as address, sex. Job, age, 
telephone number, etc., as the related attribute 
information and utilizing them in slotable combination. 

35 
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Referring now to F1&, 9 to Fig, 22, the second 
embodiment directed to the above described second scheme of 
the present Invention will be described in detail. 

Fig. 9 shows an exemplary conf Ig-uration of a speech 
5 recognition based interactive information retrieval 
apparatus in the second embodiment of the present 
invention- This interactive Information retrieval apparatus 
11 comprises a speech input unit 12, a speech 
identification unit 13, a dialogue control unit 14, a 

10 speech retrieval key relevancy calculation unit 15, and a 
speech output unit 16. The speech identification unit 3 
further comprises a speech recognition unit 13-1 and a 
speech recognition result output unit 13-2. The dialogue 
control unit 14 further comprises a result adjustment unit 

15 14^1, a dialogue leading unit 14-2, and a query and 
response generation unit 14-3, 

■ F~ 

'fl The speech identification unit 13 utilizes a speech 

Ul recognition device 18, and the speech output unit 16 

pJ utilizes a speech output device 19, Also, the speech 

in 20 recognition processing for input speech at the speech 

identification unit 13 and the next dialogue leading at the 
l^f dialogue leading unit 14-2 of the dialogue control unit 3 4 

\jj utilize a speech recognition database 17- The speech 

5 recognition database 17 comprises a plurality of 

25 statistically hlerarchlzed databases 17-1, a retrieval key 
attribute database 17-2 that stores attribute Items of 
retrieval key candidates for all retrieval target speech 
retrieval keys, a related information recognition result 
table storage area 17-3 and a YES/NO type template database 
30 17-4. 

Fig, 10 shows an exemplary overview of the 
statistically hlerarchlzed databases 17-1, Here, the 
Importance levels according to statistical information such 
as past access frequencies by system users are defined with 
35 respect to all speech retrieval key candidates that 
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constitute speech recognition target words, and the 
statistically hierarchized databases 17-1 are formed by 
subdividing the speech recognition target data In a 
hierarchical structure In an order of the importance 
5 levels. 

The speech input unit 12 enters an input speech of the 
user into the speech Identification unit 13, 

In the speech Identification unit 13, the speech 
recognition unit 13-1 carries out the speech recognition 
10 processing using the speech recognition device 18 with 

respect to the Input speech entered from the speech input 
unit 13 first. At this point, the speech recognition device 
18 refers to the speech recognition database 17 according 
to a stage of the dialogue leading to which the input 
15 speech correspond. Namely, the retrieval key attribute 
Q database 17-2 and the related Information recognition 

i{ result table storage area 17-3 are referred when a response 

jj to the retrieval key determination related query is entered 

U from the speech input unit 12, and the YES/NO type template 

n 20 database 17-4 Is referred when a response to the 

presentation of the speech retrieval key candidate Is 
t entered from the speech input unit 12. 

y Here, the speech recognition processing is started 

3 parallelly with respect to all levels of the statistically 

^ 25 hierarchized databases 17-1 simultaneously as the speech 

retrieval key is entered from the user. Then, the speech 
recognition result output unit 13-2 produces a speech 
recognition result table in which the retrieval key 
candidates for the statistically hierarchized database 17-1 
30 of each level are arranged in a descending order of their 
recognition likelihoods, when the speech recognition 
processing for the statistically hierarchized database 17-1 
of each level is finished. An example of the speech 
recognition result table with respect to the highest level 
35 statistically hierarchized database is shown in Fig- 11- 
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Because of the difference In the number of recognition 
target words contained, the speech recognition processing 
and the speech recognition result table production for the 
highest level statistically hlerarchized database are 
5 finished earliest among the statistically hlerarchized 
databases 17-1. When the speech recognition result table 
for the highest level statistically hlerarchized database 
is produced, the recog^nltion result is sent to the dialogue 
control unit 14, At this point, the speech recognition 
10 processing and the speech recognition result table 

production for the lower level statistically hlerarchized 
databases are continued even when the processing for the 
higher level proceeds to the next stage. 

The dialogue control unit 14 determines a dialogue 
15 leading to be carried out next by the Interactive 
yi information retrieval apparatus 11 with respect to the 

'f^ user, according to the number of speech retrieval key 

m leading candidates having the retrieval key likelihood that 

exceeds a prescribed likelihood threshold In the speech 
20 recognition result table for the highest level 

□ statistically hlerarchized database sent from the speech 
recognition result output unit 13-2 of the speech 

j^? identification unit 13. 

□ When the speech recognition result table with respect 
''■'■^ 25 to the speech retrieval key is received at the result 

adjustment unit 14-1, If the number of the speech retrieval 
key leading candidates in the recognition target 
statistically hlerarchized database Is less than or equal 
to a prescribed number but not zero, the dialogue leading 

30 unit 14-2 determines to carry out the retrieval key 

determination related query by referring to the retrieval 
key attribute database 17-2 shown in Fig. 12. and the 
retrieval key determination related query Is generated by 
the query and response generation unit 14-3- Here, the next 

35 dialogue leading conditions are determined in advance as 
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follows, for example. 

1, A case where the speech retrieval key leading 
candidates greater than the prescribed number are 
outputted. 

5 2- A case where there Is no speech retrieval key 

leading candidate, 

3. A case where a candidate that Is determined as the 
speech retrieval key as a result of the cross-checking of 
the related attribute information candidates obtained from 

10 the retrieval key determination related query and the 
recognition likelihoods is presented but negated by the 
user as not corresponding to the speech retrieval key. 

4- A case where there is no candidate which is related 
to the related attribute Information candidates obtained 

X5 from the retrieval key determination related query among 
the speech retrieval key leading candidates as a result of 
referring to the retrieval key attribute database. 

Only in the case where the recognition target Is the 
highest level, when the recognition result in the 

20 recognition target statistically hlerarchized database 

satisfies any of the above described next dialogue leading 
conditions, if no related attribute information has been 
obtained yet, the dialogue control unit 14-2 determines to 
carry out a new retrieval key determination related query 

25 and commands the query and response generation unit 14-3 to 
generate a query message. In the other cases, the relevancy 
of the related attribute Information candidates obtained by 
then and the speech retrieval key candidates In the 
recognition target statistically hlerarchlzed database is 

30 Judged by referring to the retrieval key attribute database 
17-2 and the related information recognition result table 
storag^e area 17-3. and the normalization and the cross- 
checking of the recognition likelihoods are carried out by 
accessing the speech retrieval key relevancy calculation 

35 unit 15- Then, the query and response generation unit 14'-3 
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Is commanded to g:enerat:e a query message for presenting the 
speech retrieval key that has the highest newly calculated 
retrieval key recognition likelihood. 

During the above operation, the speech recognition 
5 processing and the speech recognition result table 
production for the other levels of the statistically 
hlerarchlzed databases 17-1 are continually carried out by 
the speech recognition unit 13-1 and the speech recognition 
result output unit 13-2 of the speech identification unit 
10 13. 

Then, the generated response message or query message 
is outputted to the user from the speech output unit 16 
using the speech output device 19, and a user's response is 
obtained at the speech input unit 12 again. The speech 

IS identification unit 13 carries out the speech recognition 
processing for the user's response to the response message 
or query message entered from the speech input unit 12 
again and outputs the result. 

By this time, the speech recognition result table 

20 production for the second statistically hlerarchlzed 
database is already finished. 

When the user's response received from the speech 
input unit 12 is a response to the retrieval key 
determination related query, the speech recognition result 

25 output unit 13-2 produces a related information recognition 
result table from the result of the speech recognition 
processing by the speech recognition unit 13-1. and stores 
It in the related information recognition result table 
storage area 17-3 of the speech recognition database 17, 

30 while also sending the result to the result adjustment unit 
14-1, An example of the related information recognition 
result table is shown in Fig. 13, 

When the related information recognition result table 
is received at the result adjustment unit 14-1, the 

35 dialogue leading unit 14*2 determines a policy for the 
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dialogue according to the number of the speech retrieval 
key leading candidates having the retrieval key recognition 
likelihood that exceeds the prescribed likelihood threshold 
by referring to the speech recognition result table for the 
5 second statistically hlerarchized database for which the 
speech recognition processing' and the speech recognition 
result table production are already finished, similarly as 
in the dialogue leading for the highest level statistically 
hlerarchized database . 

10 When the number of the speech retrieval key leading 

candidates in the speech recognition result table for the 
second statistically hlerarchlzed database Is less than or 
equal to the prescribed number but not zero, the narrowing 
down by the retrieval key determination related query is 

15 carried out, and when any of the next dialogue leading 

conditions is satisfied, the relevancy with respect to the 
related attribute information candidates obtained by then 
is Judged and the recognition likelihoods are cross- 
checked, and the speech retrieval key candidate with the 

20 highest retrieval key recognition likelihood is determined 
as the speech retrieval key. 

When the response to the speech retrieval key 
presentation Is received at the result adjustment unit 14- 
1. If the response is "Yes", the dialogue leading unit 14-2 

25 determines to generate a response message for notifying the 
speech retrieval key determination success and the query 
and response generation unit 14-3 generates this response 
message, and then the processing is finished. On the 
contrary, if the response is "No", the next dialogue 

30 leading condition Is satisfied so that the result 

adjustment unit 14-1 commands the further dialogue leading 
to the dialogue leading unit 14-2 and the dialogue leading 
using the recogrnitlon result for the third statistically 
hierarchlzed database is started, 

35 In this way, the normalization and the cross-checking 
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of the recognition likelihoods utllizlnff the related 
attribute Information obtained by the retrieval key 
determination related query are repeated until the speech 
retrieval key Is determined, by following the dialogue 
5 policy according to the number of the speech retrieval key 
leading- candidates. 

Fig, 14 shows the processlni^ procedure of the dialogue 
control unit 14 in the Interactive information retrieval 
apparatus 11 of this embodinient, 

10 First, when the speech recognition result table 

obtained from the highest level statistically hlerarchlzed 
database exists (step S21), if the number of the speech 
retrieval key leading candidates having the retrieval key 
recognition likelihood that exceeds the prescribed 

15 likelihood threshold is less than or equal to the 

prescribed number such as 2 but not zero (step S22), the 
retrieval key determination related query is carried out 
with respect to the user in order to obtain the related 
attribute Information according to the speech retrieval key 

20 candidates narrowing down method (step S23) , and the speech 
recognition processing for the user's response to the 
retrieval key determination related query is carried out 
using the speech recognition device and the related 
information recognition result table is produced {step 

25 S24). 

When the obtained related attribute Information 
candidate is found to be related to the speech retrieval 
key leading candidate In the highest level statistically 
hlerarchlzed database that is currently being narrowed down 

30 by referring to the retrieval key attribute database (step 
S25), the related information recognition likelihood of 
that related attribute information and the retrieval key 
recognition likelihood of that speech retrieval key leading 
candidate are cross-checked to yield a new recognition 

35 likelihood for that speech retrieval key leading candidate 
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(step 326), and the speech retrieval key candidate having 
the highest retrieval key recognition likelihood is 
presented to the user and the confirmation process is 
carried out (step S27), 
5 Here, the next dialogue leading conditions are 

determined In advance as follows, for example- 

1, A case where the speech retrieval key leading 
candidates greater than the prescribed number are 
outputted. 

10 2- A case where there is no speech retrieval key 

leading candidate. 

3- A case where a candidate that is determined as the 
speech retrieval key as a result of the cross-checking of 
the related attribute Inrormatlon candidates obtained from 
IB the retrieval key determination related query and the 
recognition likelihoods is presented but negated by the 
user as not corresponding to the speech retrieval key- 

4, A case where there is no candidate which is related 
to the related attribute information candidates obtained 
20 from the retrieval key determination related query among 
the speech retrieval key leading candidates as a result of 
referring to the retrieval key attribute database. 

In the case other than the above described case where 
the number of the speech retrieval key leading candidates 
25 is less than or equal to the prescribed number but not 
zero, when any of the above described four next dialogue 
leading conditions is satisfied (step S28), if the already 
obtained related attribute information candidate exists 
(step S29), the recognition result for the next level is 
30 obtained (step S32) and the relevancy with respect to the 
related attribute information candidate is obtained (step 
S33), If the already obtained related attribute information 
candidate does not exist, the retrieval key determination 
related query is newly carried out (step S3Q) and the 
35 related information recognition result table is produced 
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(step S31), and then the recog^nition result for the next 
level Is obtained (step S32) and the relevancy with respect 
to the related attribute information candidate is obtained 
(step S33) . 

5 When the related attribute information candidate so 

obtained Is found to be related to the speech retrieval key 
leading candidate in the next (lower) level statistically 
hlerarchlzed database for which the speech recognition 
processing and the speech recognition result are already 

10 finished at this point, by referring to the retrieval key 
attribute database, the retrieval key recognition 
likelihood of the speech retrieval key leading candidate 
and the related information recognition likelihood of the 
related attribute information are cross-checked to yield a 

15 new retrieval key recognition likelihood (step S34) , 

When the number of the speech retrieval key leading- 
candidates in the next level statistically hlerarchlzed 
database is less than or equal to the prescribed number 
such as 2 but not zero (step S22), the retrieval key 

20 determination related query is carried out with respect to 
the user in order to obtain another related attribute 
Information according to the speech retrieval key 
candidates narrowing down method (step S23), and the speech 
recognition processing for the user's response to the 

25 retrieval key determination related query is carried out 
using the speech recognition device and the related 
information recognition result table is produced (step 
S24) , 

Then, the relevancy with respect to all the related 
30 attribute information candidates obtained by this and 
earlier retrieval key determination related queries is 
comprehensively evaluated (step , the related 

information recognition likelihood of the related attribute 
information is cross-checked with the retrieval key 
35 recognition likelihood of the speech retrieval key leading 
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caixdldare in the next level statistically hierarchlzed 
database which Is the current reco^itlon target {step 
S26J, and the speech retrieval key candidate having the 
highest retrieval key recognition likelihood is presented 
5 ro the user and the confirmation process Is carried out 
Istep S27) . 

Then, when any of the above described four next 
dialogue leading conditions Is satisfied by the speech 
recognition result for the next level statistically 
10 hlerarchlzed database (step S28), the next lower 

statistically hierarchized database for which the speech 
recognition processing and the speech recognition result 
table production are already finished at this point is 
processed similarly as the higher level statistically 
'^i 15 hlerarchlzed database (steps S29, S30» S31, and S32) , and 
jjl when all the related attribute Information candidates 

)3 obtained so far are found to be related (step S33), the 

m recognition likelihoods are cross-checked to yield a new 

retrieval key recognition likelihood {step S34)- 
20 When the number of the speech retrieval key leading 

Q candidates is less than or equal ro rue prescribed number 

such as 2 but not zero (step S22), the retrieval key 
iS determination related query, the speech retrieval key 

=3 candidates narrowing down, and the cross-checking of the 

''"^ 25 recognition likelihoods and all the related attribute 

information candidates obtained by then in the case where 
the next dialogue leading condition is satisfied, are 
repeated until the speech retrieval key is determined. 
In the following, the interactive information 
30 retrieval method of this embodinient will be described for a 
concrete example. Here, the case of applying the 
interacrive information retrieval method of this embodiment 
to the determination of a name of a ticket entered by the 
user In a ticket reservation system having a task of 
35 reserving a concert ticket will be described. 
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In the ticket reservation system, it Is assumed that 
the initial likelihood threshold specified by the system is 
3500, and the prescribed number of the speech retrieval key 
leading candidates specified by the system for the purpose 
5 of the dlalog-ue leading Is 2, such that the retrieval key 
determination related query will be carried out with 
respect to the user when the number of the speech retrieval 
key leading candidates having the recognition likelihood 
that exceeds the prescribed likelihood threshold 3500 is 
10 less than or equal to 2, and the recognition target 

database will be shifted to the next level when the number 
of the speech retrieval key leading candidates is greater 
than 2. 

tfere, the operations in the case where the user makes 
^ 15 a reservation of a concert ticket for "Gustav Leonhardt" 

will be described. In this concert ticket reservation 
system, the retrieval database has data of 350 concert 
m performers overall. These 350 concert performers are 

H subdivided into four levels of the statistically 

20 hlerarchlzed databases according to the access frequencies 

Q (utilizing the popularity ranking based on CD sales of the 

i n 

past year, for example). As shown in Fig. 15, the first 
level (highest level] comprises a list of top 60 performers 
Q that are presumably most popular, the second level 

25 comprises a list of top 150 performers in which 90 next 

popular performers are added to 60 performers on the first 
level list, the third level comprises a list of top 250 
performers in which 100 next popular performers are added 
to 150 performers on the second level list, and the fourth 

30 level comprises a list of all 350 performers. The target 
speech retrieval key ''Gustav Leonhardt" is the 90th in the 
popularity ranking so that it does not exist in the first 
level statistically hlerarchlzed database. 

When the speech retrieval key "Gustav Leonhardt" is 

35 entered from the speech input unit 12* the speech 



70- 



00-05-26 19:44 585^-00114049492499 M iXflTn-MIYOSI & MiYOSi 



T-563 P. 77/86 U-221 



recognition processing for all four levels of the 
statistically hierarchized databases 17-1 is started 
simultaneously in parallel at the speech recognition unit 
13-1 of the speech identification unit 13. 
5 The speech recognition result output unit 13-2 

produces the speech recognition result table as shown in 
Fig. 16 by arranging 60 performers in the list of the 
highest level statistically hierarchized database In a 
descending order of the retrieval key recognition 
10 likelihood according to the speech recognition result of 
the speech recognition unit 13-1, and sends It to the 
dialogue control unit 14. 

The result adjustment unit 14-1 selects the speech 
retrieval key leading candidates having the retrieval key 
q 15 recognition likelihood that exceed the prescribed 
=^3 likelihood threshold 3500 from the speech retrieval key 

candidates in the speech recognition result table of F1&. 
Ul 16. As can be seen in F1&- 16, there are five speech 

pf retrieval key leading candidates "London Symphony", "Boston 

1,0 20 Symphony"* "New York Philharmonic", "Vienna State Opera'' 

and "Metropolitan Opera" in this case, 
j'^;^ since the number of the speech retrieval key leading 

(y candidates Is greater than the prescribed number 2, the 

3 dialogue leading unit 14-2 Judges that the next dialogue 

n 25 leading condition No, 1 is satisfied, and since no related 

attribute information has been obtained at this point, the 
dialogue leading unit 14-2 determines to carry out the 
retrieval key determination related query in order to 
obtain the related attribute information- 
30 As shown in Fig, 17, the retrieval key attribute 

database 17-2 stores the attribute values of the attribute 
items for all 350 concert performers, such as the concert 
date, the day of the week of the concert, the place of the 
concert, the prefecture in which the place of the concert 
35 is located, and the style of music to be played in rhe 
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concert , 

The dialogue leading unit 14-2 determines to Inquire 
the c^oncert date as the retrieval key determination related 
query according to the retrieval key attribute database of 
5 Fig". 17, and commands the query and response generation 
unit 14-3 to generate the retrieval key determination 
related query of "What Is the date of this concert?". 

The speech output unit 16 presents this retrieval key 
determination related query for inquiring the concert date 

10 to the user using the speech output device 19. 

Then, the response "March 3" to this retrieval key 
determination related query from the user is entered from 
the speech input unit 2, 

The speech recognition unit 13-1 carries out the 

15 speech recognition processing using the speech recognition 
device 18 for the user's response "March 3" that is sent to 
the speech identification unit 13, and the speech 
recognition result output unit 13-2 produces the related 
information recognition result table as shown in Fig. 18 in 

20 which the concert date candidates are arranged in a 

descending order of the recognition likelihood by referring 
to the date column of the retrieval key attribute database 
17-2 and sends it to the dialogue control unit 14, 

By this time, the speech recognition processing and 

25 the speech recognition result table production for the 

second statistically hierarchized database (containing 150 
performers) are finished. The speech recognition result 
table for the second statistically hierarchized database is 
shown in Fig, 19, 

30 The result adjustment unit 14-1 of the dialogue 

control unit 14 refers to the second statistically 
hierarchized database, and commands the speech retrieval 
key relevancy calculation unit 15 to carry out the 
normalization and multiplication of the retrieval key 

35 recognition likelihood of the speech retrieval key 
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candidate and the related Information recognition 
likelihood of the related attribute information candidate, 
with respect to the concert date candidates in the related 
information recognition result of Fig. 1 regarding the 
5 concert date and the speech retrieval key candidates which 
are found to be related to the speech retrieval key 
candidates in the speech recognition result table for the 
second statistically hierarchized database. 

The speech retrieval key relevancy calculation unit 15 
10 first normalizes the retrieval key recognition likelihoods 
in the speech recognition result table of Figr, 19 as 
indicated in the rightmost column. Then, the concert date 
information of "Hesperlon XX: March 30", "Consort of 
Musicke: April 10", "London Symphony: May 30", "Gustav 
-I 15 Leonhardt: March 3", and "Boston Symphony: April 10" are 
3 obtained as the related attribute information for the five 

£| speech retrieval key candidates having the recogrnition 

j1 likelihood that exceeds the prescribed likelihood threshold 

y 3500 in the speech recognition result table of Fig:. 19- by 

20 referring to the retrieval key attribute database 17-2, 
a Also, the related information recognition likelihoods for 

% the concert date candidates in the related information 

y recognition result table of F1&. 18 are normalized as 

P indicated in the rightmost column, 

25 Then, when the concert date candidate coincides with 

any of the concert dates of the five retrieval key 
candidates "Hesperlon XX", "Consort of Musicke". "London 
Symphony", "Gustav Leonhardt", and "Boston Symphony" having 
the recognition likelihood that exceeds the prescribed 
30 likelihood threshold 3500 in the speech recognition result 
table of Fig, 19 obtained from the second statistically 
hierarchized database, the normalized related information 
recognition likelihood in the related information 
recognition result table and the normalized retrieval key 
35 recognition likelihood of the speech retrieval key 
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candidate in the speech recognition result table, so as to 
obtain the new recognition likelihoods for "Hesperlon XX", 
"Consort of Musicke", "London Symphony'', "Gustav 
Leonbardt"*, and "Boston Symphony", 
5 In other words, based on the relevancy of the speech 

retrieval key candidates "Hesperion XX", "Consort of 
Muslcke", "London Symphony", "Gustav Leonhardt" and "Boston 
Symphony" in the speech recognition result table of Fig- 19 
with respect to the concert dates in the related 

10 information recog'nition result table, a product of the 

normalized retrieval key recognition likelihood of each of 
these speech retrieval key candidates and the normalized 
related Information recognition likelihood of the related 
concert date Is calculated as the new recognition 

15 likelihood. 

In this case, as shown In Fig, 20, the concert date 
"March 30" ot "Hesperion XX'* has the normalized related 
Information recognition likelihood of 0.0055 in the related 
information recognition result table so that the new 

20 recognition likelihood of "Hesperion XX" is given by 0.0080 
^ 0.0055 = 0,00044. Similarly, the normalized recognition 
likelihood 0.0077 of "Consort of Muslcke" is multiplied 
with the normalized related information recognition 
likelihood 0.0080 of "April 10" to yield the new 

25 recognition likelihood of 0,000062. For "London Symphony", 
the new recognition likelihood is to be obtained by 
multiplying the normalized related information recognition 
likelihood of "May 30", but In this example it Is assumed 
that "May 30" is not Included in the recognition targret 

30 words so that this date is not recognizable and therefore 
the related Information recognition result is not obtained, 
and for this reason the new recognition likelihood of 
"London Symphony" Is set equal to 0. The normalized 
recognition likelihood 0.0072 of "Gustav Leonhardt" is 

35 multiplied with the normalized related information 
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recognition likelihood 0.0077 of "March 3" to yield the new 
recognition likelihood of 0.000056. and the normalized 
recognition likelihood 0.0067 of "Boston Symphony" is 
multiplied with the normalized related information 
5 recognition likelihood 0.0080 of "April 10" to yield the 
new recognition likelihood of 0.000054. 

The result adjustment unit 14-1 sends the result of 
calculating the new recognition likelihoods by the 
normalization and the cross-checking for the speech 

10 retrieval key candidates of the second statistically 

hierarchlzed database that are selected as described above, 
to the dialogue leading unit 14-2. 

The dialogue leading unit 14-2 defines the renewed 
likelihood threshold for the retrieval key recognition 

15 likelihoods of the second statistically hierarchlzed 
database as 0.2590 according to the normalized new 
recognition likelihood. This renewed likelihood threshold 
is determined to be a value which is smaller than the 
highest likelihood value by more than a prescribed value, 

20 for example. Then, the dialogue leading according to the 
number of the speech retrieval key candidates having the 
normalized new recognition likelihood that exceeds the 
renewed likelihood threshold 0.2590 is started. As can be 
seen in Fig. 20, there are two speech retrieval key leading 

25 candidates "Consort of Musicke" and "Gustav Leonhardt" 

which have the recognition likelihood exceeding 0,2590 in 
this case. 

Since the number of the speech retrieval key leading 
candidares in the normalized and cross-checked recognition 

30 result table is less than or equal to the prescribed number 
2, the dialogue leading unit 14-2 determines to carry out 
the narrowing down of the leading candidates by obtaining a 
new related attribute information and determines to inquire 
the place of the concert as the new related attribute 

35 information, by referring to the retrieval key attribute 
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database 17-2. 

The query and response generation unit 14-3 generates 
the retrieval key determination related query of "Please 
answer the place of this concert" for Inquiring the place 
5 of the concert, and this retrieval key determination 

related query is outputted from the speech output unit 16. 

Then, the response speech "Casals Hall" by the user is 
entered from the speech input unit 12, and sent to the 
speech identification unit 13. The speech recognition 

10 processing for the place of the concert candidate Is 
carried out at the speech recognition unit 13-^1 of the 
speech Identification unit 13, the related information 
recognition likelihood of each candidate is calculated at 
the speech recognition result output unit 13-2, and the 

15 related information recognition result table is sent to the 
dialogue control unit 14. The related information 
recognition result table for the place of the concert 
obtained as the related attribute Information is shown in 
Fig, 21, The rightmost column in the related information 

20 recognition result table of Fig. 21 Is the normalized 
recognition likelihood. 

Then, the result adjustment unit 14-1 commands the 
speech retrieval key relevancy calculation unit 15 to carry 
out the cross-checking of the recognition likelihoods by 

25 Judging the relevancy of the speech retrieval key leading 
candidates in the second statistically hlerarchized 
database which is currently a target of the narrowing down, 
with respect to both of the related attribute information 
Including the place of the concert now obtained and the 

30 concert date information that was obtained earlier by 
inquiring the concert date which is now stored in the 
related information recognition result table storage area. 

The speech retrieval key relevancy calculation unit 15 
carries out the cross-checking of the retrieval key 

35 recognition likelihood and the related Information 
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recognition likelihood of each related attribute 
information when the speech retrieval key leading 
candidates "Consort of Muslcke" and "Gustav Leonhardt" are 
found to be related to the related attribute information 
5 candidates In the concert date recognition result and the 
place of the concert recognition result that is newly 
obtained, by referring to the retrieval key attribute 
database 17-2. 

Namely, In this case, as shown in Fig, 21, "Casals 
10 Hall", "Orchard Hall", "Festival Hall", "Symphony Hall", 
"NHK Hall", etc, are obtained as the related attribute 
information candidates for the place of the concert. The 
norniallzed new recognition likelihoods for the "Consort of 
Muslcke" and "Gustav Leonhardt" In the rightmost column In 
15 a lower part of Fig. 20 are values obtained by the 
jjj normalization and the cross-checking of the retrieval key 

Gl recognition likelihoods of the speech retrieval key leading 

pj'l candidates "Consort of Muslcke" and "Gustav Leonhardt" and 

the related information recognition likelihoods of the 
20 concert date information, so that by cross^-checklng the 
□ related Information recognition likelihood of the place of 

^rf the concert candidate and the values in the rightmost 

Lii 

column in a lower part of Fig. 20, the cross-checking with 
Q the two related attribute information of the concert date 

25 information and the place of the concert Information can be 
realized. The relevancy of the places of the concert shown 
in Fig, 21 with respect to the speech retrieval key leading 
candidates "Consort of Muslcke" and "Gustav Leonhardt" is 
Judged from the retrieval key attribute database 17-2. 

30 As a result, as shown in Fig. 22, "Consort of Muslcke" 

has the related attribute of "Suntory Hall", so that by 
multiplying the respective normalized recognition 
likelihoods of 0,2897 and 0.0397, the new recognition 
likelihood of "Consort of Muslcke" becomes O.OllSO, whereas 

35 "Gustav Leonhardt" has the related attribute of "Casals 
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Hall", so that by multiplying the respective normalized 
recognition likelihoods of 0.2593 and 0.0833. the new 
recognition likelihood of "Gustav Leonhardt" becomes 
0.02160. 

5 From this result, the dialogue leading unit 14-2 

determines that the speech retrieval key leading candidate 
"Gustav Leonhardt" for which the highest retrieval key 
recognition likelihood is obtained as a result of the 
recognition likelihood cross-checking is the speech 
10 retrieval key. and commands the query and response 

generation unit 14-3 to generate a message for presentation 
to the user, according to the dialogue leading scheme. 

The speech output unit 16 outputs the determined 
candidate presentation message of "You wish to attend a 
3 15 concert of Gustav Leonhardt on March 3 at the Casals Hall, 

'fi correct?" . 

Yj The user's response "Yes" to this presentation is 

entered from the speech input unit 12 and sent to the 

2 speech identification unit 13. The speech recognition unit 
=3 20 13-1 carries out the speech recognition processing using 

^ the YES/NO type template database 17-4. and the speech 

If recognition result output unit 13«2 sends the recognition 

^ result to the dialogue control unit 14- 

□ The result adjustment unit 14-1 sends the recognition 

3 25 result "Yes" received from the speech recognition result 

output unit 13-2 to the dialogue leading unit and the 

dialogue leading unit 14-2 Judges that the correct speech 
retrieval key has been determined and determines to finish 
the dialogue, 

30 As can be seen from the above description. In the cafie 

of the large scale speech recognition target words, the 
recognition processing requires a long time and moreover 
the recognition accuracy is not 100% by the current speech 
recognition technology so that it is difficult to achieve 

35 the task requested by speech from the user within a 
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prescribed period of time. Namely, tHe user must be kept 
awaiting while the system carries out the speech 
recognition, and when the candidate presented after waiting 
turns out to be the recognition error* It is necessary to 
5 repeat the query and response until the correct candidate 
is presented or another speech input is requested and the 
user is kept awaiting again, so that it Is difficult to 
achieve the task through natural dialogues similar to 
dialogrues with the human operator- 

10 According to the second scheme of the present 

invention described in this embodiment, the Importance 
levels are defined for all data according to the 
statistical information such as access frequencies, and the 
speech recognition database is provided In forms of a 

15 plurality of statistically hierarchized databases in which 
data are subdivided and hierarchically structured according 
to the importance levels. Also, the virtual real time 
performance for the speech recognition processing Is 
realized by utilizing difference In the recognition time 

20 due to difference in the number of data contained in these 
databases . 

Also, by setting a threshold for the recognition 
likelihood of the speech recognition processing, the 
effective narrowing down is realized by inquiring the 

25 related attribute information when there are a small number 
of highly reliable recognition results. When the number of 
highly reliable recognition results Is greater than a 
prescribed number, or when there is no highly reliable 
recognition result, or when the first candidate is negated 

30 by the user as not the correct retrieval key, there Is a 

possibility that the correct retrieval key candidate is not 
contained in the highest level statistically hierarchlzed 
database, so that the recognition target is shifted to the 
lower level statistically hlerarchized database, and 

35 Incompleteness of the speech recognition accuracy is 
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compensated by carrying out the cross-checking' with the 
related attribute Information. Also, by carrying: out the 
retrieval key determination related query to continue the 
dialogue, the natural dialogue is realized while pretending 
5 as if the speech recognition processing is carried out for 
all the data. 

Note that this interactive information retrieval 
method is easily applicable to systems in which the task Is 
conventionally achieved by the human operator, such as the 
10 seat reservation in which a seat is to be determined using 
a price of the seat as attribute, and the station search in 
which a station name is to be determined using a route as 
attribute. In addition, this interactive information 
retrieval method is also applicable to the name search in 
=;;3 15 which the retrieval key is a name of a person ► by providing 

;:'J a plurality of attributes such as address, sex, job, age, 

jj telephone number, etc, as the related attribute 

l'-^ information and utilizing them in suitable combination- 

20 Referring now to Fig. 23 to Fig, 31, the third 

•J embodiment directed to the above described third scheme of 

Ly the present invention will be described in detail. 

Fig. 23 shows an exemplary configuration of a speech 
-I recognition based interactive information retrieval 

25 apparatus in the third embodiment of the present invention. 
This Interactive information retrieval apparatus comprises 
a central processing unit (CPU) 110, a memory device 120, a 
database 130 and a user device 140. Here, it is also 
possible to connecte the CPU 110 and the user device 140 
30 through a network - 

The CPU 110 is a major component of this interactive 
information retrieval apparatus, which comprises an input 
request unit 111. a speech recognition unit 112, a 
recognition result adjustment unit 113 and a user interface 
35 (speech Interface) 114. [^ote that these elements ill to 114 
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can be constructed by utilizing hardware and software of 
the general purpose computer in practice. The memory device 
120 is a work memory of the CPU llO which stores various 
programs and intermediate processing result data as well as 
5 an attribute value leading candidate group 121 and a 

recognition target retrieval key cndidate group 122 to be 
described below* It is also possible to provide this memory 
device 120 as a built-in element of the CPU 110. The 
database 130 is an external memory device of the CPU 110, 
10 which comprises a speech recognition database 131, an 

attribute database 132, and a YES/NO type template database 
133, The user device 140 comprises a speech input unit 141 
and a speech output unit 142, and exchanges data with the 
CPU 110 basically in forms of speeches, 
15 Fig. 24 shows an exemplary configuration of the speech 

kS recognition database 131, and Fig. 25 shows an exemplary 

configuration of the attribute database 132. Note that the 
Q YES/NO type template database 133 in this embodiment 

basically stores only "Yes" and "No" data so that Its 



m 



\n 20 configuration will not be described here. 

As shown in Fig, 24, the speech recognition database 
J^!! 131 contains retrieval key candidates, and attribute values 

jy of attribute items of the retrieval key candidates 

P separately for each attribute Item. In general, a large 

i!3 

25 scale speech recognition database comprises the number of 

•is:? 

retrieval key candidates that cannot be processing within a 
prescribed real time. Also, as shown in Fig- 25, the 
attribute database 132 contains attribute value candidates 
for each attribute item separately. The number of attribute 

30 value candidates is in general set to be the number for 
which the recognition can be finished in real time. 

Fig. 26 shows a processing procedure for the retrieval 
key determination in this embodiment. The outline of the 
operation of the interactive information retrieval 

35 apparatus of Fig. 23 will now be described with reference 
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to Fie. 26. 

The Input request unit 111 deternilnes an attribute 
Irem to be used In selecting the recognltloa rarget words 
that can be processed in real time, and notifies the 
5 determined attribute item to the speech recognition unit 
112 while also requesting- the user to enter the attribute 
value of the attribute Item through the user Interface 114 
(step S41), The user listens to the attribute value input 
request through the speech output unit 142. and enters the 
10 attribute value from the speech input unit 141 (step S42)* 
When the attribute value is entered from the user 
through the user Interface 114, the speech recognition unit 
112 carries out the speech recognition processing for the 
Input attribute value by referring to the attribute 
□ 15 database 132 and calculates the recognition likelihood of 

each attribute value candidate for that attribute item 
ffi (step S43) , At this point, the recognition likelihood is 

y calculated as a similarity (distance) between the input 

attribute value and each attribute value candidate, for 
'^Q 20 example . 

L The recognition result adjustment unit 113 receives 

!J1 each attribute value candidate and its recognition 

likelihood from the speech recognition unit 112, extracts 
those attribute value candidates which have the recognition 
Q 25 likelihood greater than or equal to a prescribed threshold 

as the attribute value leading candidates, and stores them 
in the memory device 120 (step S44) . Then, the recognition 
result adjustment unit 113 searches through the speech 
recognition database 131 using these attribute value 
30 leading candidates as keys, extracts retrieval keys that 
have the same attribute values as the attribute value 
leading candidates for that attribute item, and stores them 
as the recognition target retrieval key candidates In the 
memory device 120 (step S45)- 
35 By the above operation, the recognition target 
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rerrieval key candidates are narrowed down to the number of 
words that can be processed In real time. After this, the 
control is returned to the input request unit 111 ag:aln. 

The input request unit 111 requests the user to enter 
5 the retrieval key through the user Interface 114 (step 

S46)- The user listens to the retrieval key input request 
through the speech output unit 142, and enters the target 
retrieval key from the speech Input unit 141 (step S47) . 
When the retrieval key is entered from the user 
10 through the user interface 114, the speech recognition unit 
112 carries out the speech recognition processing for this 
input retrieval key using the retrieval key candidates 
stored in the memory device 120 as the recognition target, 
and calculates the recognition likelihood of each retrieval 
P 15 key candidate (step S48), At this point, the recognition 

likelihood is calculated as a similarity (distance) between 
m the Input retrieval key and each retrieval key candidate, 

^aJ for example, 

LjJ The recognition result adjustment unit 113 outputs the 

20 retrieval key candidates In a descending order of the 
recognition likelihood to the user through the user 
\j] interface 114, and carries out the confirmation process 

UJ with the user until the retrieval key is determined (step 

^ S49), More specifically, the recognition result adjustment 

□ 25 unit 113 outputs the retrieval key candidates In a 

descending order of the recognition likelihood to the user, 
"Yes" or "No" entered by the user in response is recognized 
at the speech recognition unit 112 by referring to the 
YES/NO type template database 133, and this result is given 
30 to the recognition result adjustment unit ll3. This 

operation is repeated until "Yes" is returned from the 
user - 

Note that the processing algorithm and procedure shown 
in Fig. 26 can be provided as a retrieval key determination 
35 program whioh is described in a language that is executable 
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by a computer and recorded In a computer readable recording 
medium such as floppy disk, CD-ROM, or memory card, for 
example . 

In the following, the interactive informatloix 
5 retrieval method of this embodiment will be descriobed for 
a coacerete example. Here, the case of applying the 
Interactive information retrieval method of this embodiment 
to the determination of an address from 4,000 cities or 
towns in Japan will be described, 
]0 This city/town determination has 4,000 cities or towns 

as the recognition target which cannot be processed in real 
time according to the current speech recognition 
technology. For this reason, the prefecture to which each 
city or town belongs is selected as the attribute item 
Q 15 here. There are only 47 prefectures in Japan, which can be 

processed in real time, 
i'J=j Now, the exemplary case of determining "Yokohama" will 

Ly be described. 

Fig- 27 shows an example of the speech recognition 
20 database 131 to be used for the city/town determination, 

and Fig, 28 shows an example of the attribute database 132 
ifi to be used in the city/town determination. In the case of 

ijj the city/town determination, as shown In Fig, 27, the 

y speech recognition database 131 contains 4,000 cities or 

S 25 towns which are the retrieval key candidates, and each city 

or town has attribute items such as a prefecture to which 
each city or twon belongs which is one of 47 prefectures 
existing in Japan, a district to which each city or town 
belongs which is one of 8 districts existing in Japan, and 
30 whether or not each city or town is located on seaside. 
First, the input request unit 111 inquires the 
prefecture which is the selected attribute item, to the 
user. The user enters "Kanagawa" which is the prefecture to 
which "Yokohama'' belongs. The speech recognition unit 112 
35 carries out the speech recognition processing for 
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"Kanagrawa" usini; the attribute database 132, and calculates 
The recognition likelihood of each one of 47 prefectures 
(attribute value candidates). Fig. 29 shows an exemplary 
recognition result for "Kanagawa" in which the candidates 
5 are arranged In a descending order of the recognition 
likelihood. 

The recognition result adjustment unit 113 selects 
those attribute value candidates which have the recognition 
likelihood greater than or equal to the prescribed 

10 likelihood threshold of 0,8 as the attribute value leading 
candidates among the recognition candidates for "Kanagawa", 
In this example, there are two attribute value leading 
candidates "Kagawa" and "kanagawa" . Then, the recognition 
result adjustment unit 113 extracts the cities or towns in 

15 Kagawa and Kanagawa as the recognition target. Fig, 30 
shows a list of the extracted recognition target. 

Next, the input request unit 111 urges the user to 
enter the target city or town which is the retrieval key. 
In response, the user enteres "Yokohama'* from the speech 

20 Input unit 141. The speech recognition unit 112 calculates 
the recognition likelihood of each city or town In Kagawa 
and Kanagawa that is extracted as the recognition target 
with respect to the input retrieval key "Yokohama", and 
outputs the recognition result. Fig. 31 shows an exemplary 

25 rercongltion result. 

The recognition result adjustment unit 113 then 
carries out the confirmation process with the user for the 
recognition result sequentially from the top candidate. In 
this example, "Yokohama" is outputted as the first (top) 

30 candidate having the highest recognition likelihood, so 

that "Yokohama" can be determined by a single confirmation 
process • 

Now. using the example described above, the method of 
this embodiment will be compared with the conventional 
35 method for narrowing down the recognition target by 
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uniquely determining the attribute value by carrying out 
the confirmation process even for the attribute value. In 
the conventional method, the determination of "Kanagawa" 
requires two confirmation processes because "Kana&awa" is 
5 outputted as the second candidate in the recognition result 
based on the recognition likelihood calculation with 
respect to the Input attribute value as shown in Figr- 29, 
so that time for two confirmation processes will be 
required before the retrieval key input. In contrast, this 
10 time for two confirmation processes is unnecessary in the 
method of this embodiment. 

In the followln&p the comparison of the processing 
time required in the method of this embodiment and the 
conventional method will be described for a concrete 
Q 15 example. It is assumed that, when the recognition target 

words are 100 words or less, the speech recognition 
m accuracy is 70% and the input speech is always outputted as 

W one of the top three candidates. Namely, it is assumed that 

\2 a probability for the input speech to be outputted as the 

i^S 20 first candidate is 70%, a probability for the input speech 

1^,, ro be outputted as the second candidate is 20%, and a 

j^^^ probability for the input speech to be outputted as the 

y third candidate is 10%, It is also assumed that » when the 

!S recognition target words are 300 words or less, the speech 

Q 25 recognition accuracy is 60^ and the input speech is always 

outputted as one of the top four candidates. In this case, 
it is assumed that a probability for the input speech to be 
outputted as the first candidate is 60%, a probability for 
the input speech to be outputted as the second candidate is 
30 25% J a probability for the input speech to be outputted as 
the third candidate is 10%, and a probability for the input 
speech to be outputted as the fourth candidate is 3%. 

The attribute item is selected such that the number of 
attribute value candidates becomes 50 or less, and the 
35 number of retrieval key candidates belonging to each 
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attribute value becomes 100 or less. Here, for the sake of 
simplicity, the speech recognition processing time T is 
regarded approximately as equal to 0 in the case of the 
number of words that can be processed in real time. The 
5 number of words that can be processed in real rime is 
assumed to be 300. A time required for one confirmation 
process is assumed to be S (sec). 

In the conventional method, the attribute value 
recognition is completed in real time T (sec) because the 
10 number of attribute value candidates is 50, and at a time 
of determining the attribute value by the confirmation 
process carried out sequentially in a descending order of 
the recognition likelihood, the number of the confirmation 
processes required is one (required time of S (sec)) at 70% 
15 probability, two {required time of 2S (sec)) at 20% 
3 probability, and three (required time of 3S (sec)) at 10^^& 

M probability so that the attribute value determination will 

7l require 0.7 x s 0.2 x 2S -r O.l x 3S = 1,4S (sec). Then, 

IJ the recognition target is narrowed down by using the 

I] 20 attribute value and the retrieval key input is urged to the 

user- Here, the speech recognition processing is completed 
3 in real time T (sec) because the number of data belonging 

to one attribute value is 100 or less. From the assumption 
3 on the recognition accuracy, the number of the confirmation 

''^ 25 processes required in the retrieval key determination Is 

one at 70% probability, two at 20^ probability, and three 
at 10% probability, so that the retrieval key determination 
will require 1*4S (sec) on average similarly as the 
attribute value determination- Thus the retrieval key 
30 recognition and determination will require T ^ 1-4S ^ 1,4S 
(sec). Consequently, under the above assumption, the 
overall time required for the retrieval key determination 
will be 1,4S t 1-4S =^ 2. as (sec). 

On the other hand, in the method of this embodiment, 
35 under the same speech recognition accuracy, the attribute 
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value recognition will require T (sec), and the correct 
attribute value is always outputted In the top three 
candidates because tue number of attribute value candidates 
is 50 or less, so that top three attribute values will be 

5 stored as the attribute value leading candidates. Then, the 
retrieval keys belonging to these top three attribute value 
leading candidates are extracted and the retrieval key 
input Is urged. Here, the number of recongltlon target 
retrieval keys becomes 300 or less because the number of 

10 data belonging to one attribute value is 100 or less. The 
recognition for the retrieval key is completed in real time 
T (sec), but because the number of the recognition target 
retrieval keys is 300. the number of the confirmation 
processes required in the retrieval key determination is 

15 one (required time of S (sec)) at 60^ probability, two 
(required time of 2S (sec)) at 25% probability, three 
(required time of 3S (sec)) at 10^ probability, and four 
(required time of 4S (sec)) at 5^ probability. Thus the 
retrieval key determination will require 0,6xS-rO,25x 

20 2S + 0-1 X 3S 0.05 X 4S = l.es (sec), and the retrieval 
key recognition and determination will require T + 1,6S 
1.6S (sec). Consequently, the overall time required since 
the start of the user input until the retrieval key 
determination will be 1,6S (sec) because the time required 

25 for the attribute value determination is T = 0 (sec) - 

From this result. It can be seen that the method of 
this embodiment can reduce the retrieval key determination 
processing time considerably compared with the conventional 
method that narrows down the recognition target after 

30 uniquely determining the attribute value. 

As can be seen from the above description, when the 
retrieval key candidates to be entered by speech from the 
user are a large number of words that cannot be processed 
in real time, because there is a limit to the number of 

35 words that can be processed in real time and the 
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recognition accuracy is lowered for the larger number of 
words according to the current speech recognition 
technology, the real time processing is realized by 
narrowing down the recognition target by using the 
5 attribute value of the attribute item of the retrieval key. 
However, the recognition accuracy cannot become 100% even 
when the recognition target is narrowed down, so that the 
confirmation process with the user is necessary in order to 
determine the input of the user, 
10 The attribute value input is an indispensable process 

for the purpose of realizing the real time speech 
recognition processing from a viewpoint of the system, but 
the inability to immediately enter the retrieval key that 
the user really wants to request appears circuitous to the 
15 user, and the repetition of two confirmation processes for 
the attribute value determination and the retrieval key 
determination causes further stress on the user. 

In the third scheme of the present invention described 
in this embodiment, the retrieval key determination is 
20 realised without carrying out the attribute value 

determination, so that the confirmation process for the 
attribute value determination is ellmiated and the circuity 
due to the repetition of the confirmation processes and the 
processing time required for the retrieval key 
25 determination are reduced, and thereby stress on the user 
is reduced- This scheme is particularly effective for the 
input speech determination using a large scale database as 
the recognition target- 



Eii 

! ¥i 



30 Referring now to Fig. 32 to Fig» 39, the fourth 

embodiment directed to the above described fourth scheme of 
the present invention will be described in detail. 

Fig, 32 shows an exemplary configuration of a speech 
recognition based interactive information retrieval 

35 apparatus in the fourth embodiment of the present 
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Invention, This interactive Information retrieval apparatus 
201 comprises a speech input unit 202, a recognition target 
data extraction unit 203, a speech recognition unit 204, a 
recognition candidate output unit 205, and a speech output 
5 unit 206. The recognition target data extract unit 203 

utilizes a recognition database 207 that comprises a speech 
recognition database 207-1 and a response database 207-2. 
The speech recognition unit 204 utilizes a speech 
recognition device 208, and the speech output unit 206 
10 utilizes a speech output device 209. 

Fig. 33 shows an exemplary overview of the speech 
recognition database 207-1 that is to be recorded in a 
recording medlum, 

The speech recognition database 207-1 is formed in two 
a 15 hierarchical levels for generic concepts and specific 

concepts, where the retrieval key to be requested by the 
m user Is a lower level data. The higher level has the number 

of words that can be processed in real time, while the 
lower level has a large number of words that cannot be 
i.y 20 processed in real time. Every lower level data has a 

t„ dependecy with respect to one higher level data, and the 

IJl number of the lower level data that are dependent on one 

higher level data is set to be the number that can be 
processed in real time. Also, by utilizing the bias in the 
Q 25 access frequencies for the large number of the lower level 

data, as many of the lower level data as the number that 
can be processed in real time are selected in a descending 
order of the access frequency, and marked "H" to form a 
high frequency access data group that is to be stored in 
30 another memory separately from the other lower level data 
that are marked "L" , 

In the Interactive Inforination retrieval apparatus 
201, when the speech is entered by the user at the speech 
input unit 202. the Identification of data to be selected 
35 as the recognition target is carried out at the recognition 
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target data extraction unit 203 according to the Input 
speech. 

FLS' 34 shows a processing procedure of the 
interactive information retrieval apparatus 201 in this 
5 embodiment . 

When the retrieval key is entered by the user at the 
speech input unit 202 (step S51), the recognition target 
data extraction unit 203 specifies the high frequency 
access data group as the recognition target data, among the 
10 lower level data in the speech recognition database 207-1 
for which the recognition and the retrieval are to be 
carried out at higher priority first (step S52), 

Then, the speech recognition processing is carried out 
at the speech recognition unit 204 (step S53), and the 
15 recognition result is outputted at the recognition 
ijl candidate output unit 205 (step S54) . At this point, the 

recognition candidates are outputted in a descending order 
of the calculated recognition likelihood. The speech output 
lU unit 206 outputs the confirmation query while presenting 

20 the outputted retrieval key candidates in a descending 

order of the recognition likelihood to the user (step S55), 
S Here, the number of times for which the confirmation query 

in 

'Q can be outputted in the confirmation process is specified 

Q m advance by the interactive information retrieval 

25 apparatus 201, 

When a response to the confirmation query is entered 
from the speech input unit 202 (step S56), the recognition 
target data extraction unit 203 specifies the response 
database 207^-2 of the recognition database 207 as the 
30 recognition target data, and when the response "Yes" is 
recognized at the speech recognition unit 204 and the 
recognition candidate output unit 205, the retrieval key 
determination success is notified to the user at the speech 
output unit 206 (step S57) . 
35 when the prescribed number of the confirmation queries 
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for the retrieval key candidates are all negated by the 
user (ttie response "No" is recoe:nlzed at the speech 
recognition unit 204 and the recognirion candidate outpux 
unit 205) (step S58 NO), the speech output unit 206 carries 
out the related query for Inquiring a generic concept o'f 
the retrieval key that is contained in the higher level 
data to the user (step S59). 

When the response to the related query is entered from 
the speech input unit 202 and recognized by the speech 
recognition unit 204, the recognition target data 
extraction unit 203 extracts the lower level data that are 
dependent on the recognized generic concept as the 
recognition target from the speech recognition database 
207-1, and then the retrieval key originally entered by the 
user is recognized at the speech recognition unit 204 again 
(step S60). Then the confirmation query for the retrieval 
key candidates that are outputted in a descending order of 
rhe recognition likelihood at the recognition candidate 
output unit 205 is outputted from the speech output unit 
206 (step S61) . The confirmation process is repeated until 
the response "Yes" is obtained from the user with respect 
to the confirmation query (step 862). When the response 
"Yes" is recognized rhe retrieval key determination success 
is notified to the user (step S63). 

In the following^ the interactive information 
retrieval method of this embodiment will be descriobed for 
a concerete example. Here, the case of applying the 
Interactive information retrieval method of this embodiment 
to the determination of an address from cities or towns in 
Japan will be described. 

Tn the city/down determination, it Js assumed that the 
number of times for which the confirmation query can be 
outputted in the confirmation process for the retrieval key 
candidates in a descending order of the recognition 
likelihood is set to be 3 when the recognition target is 
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the hleh frequency access data group, 

Flff. 35 shows an exemplary speech recognition database 
to be used In the city/town determination. Here, the cities 
or towns that can be the retrieval keys are the lower level 
5 data in the speech recognition database, and the 

prefectures in Japan are selected as the higher level data. 
There are 47 prefectures In Japan which is the number that 
can be processed in real time, every city or town has a 
prefecture to which it belongs , and the number of cities or 

10 towns belonging to one prefecture is 50 at most which can 
processed in real time. Also, the access frequencies in the 
case of using the city/town determination for telephone 
number guidance or the like Is utilized as the access 
frequencies for the cities or towns, and 50 (the number 

IS that can be processed in real time) cities or towns in a 
descending order of the access frequency are specified as 
the high frequency access data group. Fig. 36 shows an 
example of cities or towns constituting the high frequency 
access data group. 

20 First, the exemplary case of determining "Yokohama" 

will be described. 

When "Yokohama" Is entered from the speech input unit 
202, The recognition target data extraction unit 203 
extracts the cities or towns belonging to the high 

25 frequency access data group (such as Sapporo, Hakodate, 
Chuo, Kagoshlma, etc, in Fig. 35) as the recognltloa 
target data among the lower level data in the speech 
recognition database 207-1, Here, "Yokohama" is data that 
is contained in the high frequency access data group. The 

30 result of the speech recognition processing at the speech 
recognition unit 204 is outputted at the recognition 
candidate output unit 205 in a descending order of the 
recognition likelihood. Fig, 37 shows an exemplary output 
result in which the first candidate is "Yokosuka", the 

35 second candidate Is "Yokohama", the third candidate is 
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''Yotsuffl" and so on. 

The speech output unit 206 outputs the confirmation 
query for the retrieval key candidates In a descending; 
order of the recognition likelihood to the user. Since 
5 "Yokohama" Is the second candidate in Fig, 37, "Yokohama" 
can be determined as a correct one by two confirmation 
queries - 

Next, another exemplary case of determining "Yokokawa" 
will be described. Here, '"Yokokawa" is data that is not 

10 contained in the hl&h frequency access data g:roup - 

When "Yokokawa" Is entered from the speech input unit 
202, the recognition target data extraction unit 203 
extracts the high frequency access data ^roup as the 
recogrnition target data, and the speech recognition 

15 processing is carried out at the speech recognition unit 
204. Fie:- 38 shows an exemplary result outputted from the 
recognition candidate output unit 205, 

Then, according to the result of Fig. 38, the speech 
output unit 206 outputs the confirmation query for the 

20 retrieval key candidates "Yokohama", "Yokosuka" , and 

"Yokoyama" in this order. In this case, the response "No" 
is entered from the speech input unit 202 for all the 
confirmation queries, so that the interactive information 
retrieval apparatus 201 urges the user to enter the 

25 prefecture to which the retrieval key "Yokowaka" belongs 
from the speech output unit 206, When the user's response 
"Gunraa" is entered from the speech input unit 202, the 
recognition target data extraction unit extracts all the 
lower level data belonging to Gunraa, that is 41 cities or 

30 towns in Gunma, as the recognition target data. Then, the 

speech recognition processing for "Yokokawa" is carried out 
at the speech recognition unit 204 again, and the retrieval 
key candidates are outputted from the recognition candidate 
output unit 205* Fig, 39 shows an exemplary output result 

35 in this case. 
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Then, the confirmation query for the retrieval key 
candidates In a descending order of the recognition 
likelihood is outputted at the speech output unit 206. 
Since ''Yokowaka" is the first candidate in Fig. 39. 

5 "Yokokawa" can be determined as a correct one by one 
confirmation query, 

AS can be seen from the above description, In the case 
of using a large number of speech recognition target words, 
there is a limit to the number of words that can be 

10 processed in real time and the recoffnitlon accuracy Is 
lowered for the larger number of words according to the 
current speech recognition technology, so that the 
conventional system forces the user to first enter an 
efficient retrieval assist key by which the recognition 

15 target can be narrowed down to a small number of retrieval 
target words that can be recognized by the system at good 
accuracy in real time, rather than the retrieval key that 
the user really wants to request. 

According to the fourth scheme of the present 

20 invention described in this embodiment, the speech 

recognition database is formed in two hierarchical levels, 
where the retrieval keys that can be requested by the user 
are set as the lower level data and the retrieval assist 
keys in the number of words that can be processed In real 

25 time with respect to which the lower level data have 

dependency are set as the higher level data. Moreover, the 
higher level data are selected such that the number of the 
lower level data (retrieval key candidates) that are 
dependent on one higher level data is the number that can 

30 be processed in real time, and the number of the lower 
level data with higher access frequencies that can be 
processed in real time are stored separately in another 
memory, such that the high frequency access data group is 
selected as the retrieval and recognition target at higher 

35 priority. 
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Using this specifically devised database 
conf IguratlOTi , if the retrieval key Is contained in the 
hi^h frequency access data group, the retrieval key 
deternilnatlon can be realized in real time, using only the 
5 input of the retrieval key that the user really wants to 
request, without carrying out any related query for 
inquiring the generic concept as the retrieval assist key. 
Even when the retrieval key is not contained in the high 
frequency access data group p the retrieval key that the 
10 user really wants to request is entered first, and then the 
assisting generic concept is entered, which is natural from 
a viewpoint of the user, rather than forcing the user to 
start from the assisting query for inquiring the generic 
concept first in order to reali2e the effective narrowing 
Q. 15 down In the system as in the conventional system. 

: 

i 91 

m As described, according to the first scheme of the 

present invention, it becomes possible to provide a speech 
\Z recognition based Interactive Information retrieval scheme 

i3 20 capable of ascertaining the target information by 

!L determining the attribute values without making the user 

m conscious of the time required for the speech recognition 

;J processing and the retrieval, and without causing 

^ unnatural dialogues with the user due to Incompleteness of 

□ 25 the speech recognition processing. In this scheme, in a 

process for determining the attribute value necessary in 
order to ascertain the target information, the recognition 
target attribute value can be determined even when the 
number of attribute values exceeds the number that can be 
30 processed within a prescribed period of time, by utilizing 
a method for narrowing down the recognition target words 
that can return a response with a tolerable level of 
accuracy for the user without making the user to have a 
feeling of being kept awaited, and a method for 
35 ascertaining input that can realise the reduction or the 
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omission of the confirmation processes. 

Also, according to the second scheme of the present 
invention, it becomes possible to provide an operator-less 
speech recognition based interactive Information retrieval 
5 scheme using speech dialogues based on the dialogue control 
which is capable of determining the retrieval key entered 
by the user through natural dialogues. In this scheme, the 
retrieval key can be determined using a large scale 
database having the retrieval target words that cannot be 
10 processed within a prescribed period of time, without 
making the user conscious of the time required for the 
speech recognition processing and the database matching, 
and without causing unnatural dialogues with the user due 
to incompleteness of the speech recognition processing, 
15 such that the task of determining the speech retrieval key 
entered by the user can be achieved In the operator-less 
speech recognition based interactive information retrieval 
system, without making the user conscious of the waiting 
time, through dialogues that have both quickness and 
20 naturalness equivalent to a human operator based system, 
Also, according to the third scheme of the present 
invention, it becomes possible to provide a speech 
recognition based interactive information retrieval scheme 
using a large scale database as the recognition target, 
25 which is capable of ascertaining a retrieval key entered by 
the speech input while reducing stress on the user. In this 
scheme, the retrieval key is ascertained without carrying 
out the attribute value determination, such that the 
confirmation process for the purpose of determining the 
30 attribute value is eliminated and the circuity due to the 
confirmation process is eliminated^ while the processing 
time required for the retrieval key determination is 
shortened. 

Also, according to the fourth scheme of the present 
35 inentlon, it becomes possible to provide a speech 
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recognition based interactive Information retrieval scheme 
capable of reallzlne the retrieval that has both quickness 
and naturalness in determining the retrieval key from a 
large scale database. In this scheme, the recognition and 
5 the retrieval are carried oar wirhour making the user 
conscious of the waiting time and incompleteness of the 
recognition accuracy during the recognition even when the 
retrieval key that the user really wants to request is 
entered immediately at the beginning, by utilizing the bias 

10 in the access frequencies of data in the large scale 
database, in the retrieval aimed at determining the 
retrieval key entered by the user using the large scale 
database as the recognition target- 

Thus, according to the speech recognition based 

15 interactive information retrieval scheme of the present 

invention, the ambiguity in the recognition result of the 
initially entered speech input and the ambiguity in the 
recognition result of the subsequent speech input entered 
in response to the related information query can be 

20 simultaneously resolved by the cross-checking process for 
checking the relevancy of these recognition results, and 
this is a factor that contributes to the capability of 
returning an appropriate response to the user in short 
time • 

25 It is to be noted that the above described embodiments 

according to the present invention may be conveniently 
implemented using a conventional general purpose digital 
computer programmed according to the teachings of the 
present specification, as will be apparent to those skilled 

30 in the computer art. Appropriate software coding can 

readily be prepared by skilled programmers based on the 
teachings of the present disclosure, as will be apparent to 
those skilled in the software art. 

In particular, the interactive information retrieval 

35 apparatus of each of the above described embodiments can be 
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conveniently implemented In a form of a software package. 

Such a software package can be a computer program 
product which employs a storage medium including stored 
computer code which is used to program a computer to 
5 perform the disclosed function and process of the present 
invention- The storage medium may include, but is not 
limited to, any type of conventional floppy disks, optical 
disks, CD-ROMs, magneto-optical disks, ROMs, RAMs, EPROMs, 
EEPROMs, magnetic or optical cards, or any other suitable 

10 media for storing electronic Instructions, 

It is also to be noted that, besides those already 
mentioned above, many modifications and variations of the 
above embodiments may be made without departing from the 
novel and advantageous features of the present invention. 

15 Accordingly, all such modifications and variations are 
intended to be included within the scope of the appended 
claims . 
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WHAT IS CLAIMED IS: 



1. A method of speech recognition based interactive 
Information retrieval for ascertaining and retrieving a 
5 target information of a user by determining a retrieval key 
entered by the user using a speech recognition processing, 
comprising the steps of: 

(a) storing retrieval key candidates that constitute a 
number of data that cannot be processed by the speech 

10 recognition processing in a prescribed processing time, as 
recognition target words in a speech recognition database, 
the recognition target words being divided into prioritized 
recognition target words that constitute a number of data 
that can be processed by the speech recognition processing 

15 in the prescribed processing time and that have relatively 
higher Importance levels based on statistical information 
defined for the recognition target words, and non- 
prioritized recognition target words other than the 
prioritized recognition target words; 

20 (b) requesting the user by a speech dialogue with the user 
to enter a speech input indicating the retrieval key, and 
carrying out the speech recognition processing for the 
speech input with respect to the prioritized recognition 
target words to obtain a recognition result; 

25 (c) carrying out a ccnf irniatlon process using a speech 

dialogue with the user according to the recognition result 
to determine the retrieval key, when the recognition result 
satisfies a prescribed condition for judging that the 
retrieval key can be determined only by a confirmation 

30 process with the user; 

(d) carrying out a related information query using a 
speech dialogue wltn the user to request rhe user to enter 
another speech input indicating a related information of 
the retrieval key. when the recognition result does not 

35 satisfy the prescribed condition; 
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(ej carrying out the speech recognition processing: for the 
another speech input to obtain another recognition result, 
and adjusting the recognition result according to the 
another recognition result to obtain adjusted recognition 
S result; and 

(f) repeating the step (c) or the steps (d) and (e) using 
the adjusted recognition result In place of the recognition 
result, until the retrieval key is determined. 



t : 



10 2- The method of claim 1, wherein the step (d) also 
carries out the speech recognition processing for the 
speech Input with respect to as many of the non-priorltlzed 
recognition target words as a number of data that can be 
processed by the speech recognition processing in the 
15 prescribed processing time to obtain additional recognition 
result, while carrying out the related information query 
using: the speech dialogue with the user, and 

the step (e) also adjusts the recognition result by 
adding the additional recognition result, 

20 

3. The method of claim 2. wherein the non-prioritized 
recognition target words are subdivided Into a plurality of 
sets each containing a number of recognition target words 
that can be processed by the speech recognition processing 

^ 25 in the prescribed processing time, and 

the step (d) carries out the speech recognition 
processing for the speech input with respect to the 
plurality of sets in an order of the importance levels of 
the recognition target words contained In each set. 

30 

4. The method of claim 1, wherein the recognition result 
indicates recognition retrieval key candidates and their 
recognition likelihoods and the another recognition result 
Indicates recognition related Information candidates and 

35 their recognition likelihoods, and 
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the step (G) adjusts the recognition result by 
calculating new recognition likelihoods for the recognition 
retrieval key candidates according to recognition 
likelihoods for the recognition retrieval key candidates 
5 indicated in the recognition result and recognition 
likelihoods for the recognition related information 
candidates indicated in the another recognition result, 

5- The method of claim 4, wherein the step (e) calculates 
10 the new recognition likelihoods for the recognition 
retrieval key candidates by multiplying a recognition 
likelihood of each recognition retrieval key candidate with 
a recognition likelihood of a corresponding recognition 
related information candidate. 

15 

6. The method of claim 1, wherein the recognition result 
indicates recognition retrieval key candidates and their 
recognition likelihoods, and 

the step (c) judges that the recognition result 
20 satisfies the prescribed condition, when a number of 

recognition retrieval key leading candidates which have 
recognition likelihoods that are exceeding a prescribed 
likelihood threshold is less than or equal to a prescribed 
number but not zero, 

25 

7. The method of claim 1, wherein the statistical 
information used at the step (a) is access frequencies of 
the retrieval key candidates - 

30 8. The method of claim 1, wherein the prescribed 

processing time used at the step (a) is a real dialogue 
processing time specified in advance, 

9- The method of claim 1, wherein the retrieval key 
35 indicates an attribute value of one attribute of the target 
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Information, and the related Information requested by the 
related information query of the step (d) is an attribute 
value of another attribute of the target information other 
than the one attribute, 

5 

10. The method of claim 9, wherein attrlbures of the 
target information are hierarchically ordered, and the 
another attribute is a hierarchically adjacent one of the 
one attribute. 

10 

11, The method of claim 9, wherein the another attribute 
is selected to be an attribute having attribute value 
candidates that constitute a number of data that can be 
processed by the speech recognition processing in the 

15 prescribed processing: time, 

n 12- The method of claim 1, wherein the step (a) stores the 

retrieval key candidates indicating attribute values of a 
plurality of attributes of the target information, such 
20 that the rerrleval key entered by the user can indlcare an 

p 

^ atrrlbute value of any one of the plurality of attributes, 

^ 13, The method of claim 1, wherein the step (a) stores the 

I; [ 

5 retrieval key candidates as lower level data, and also 

;3 25 stores higher level data that constitute a number of data 

that can be processed by the speech recognition processing 
in the prescribed processing time, where each lower level 
data is dependent on one higher level data and lower level 
data That are dependent on one higher level data constitute 
30 a number of data that can be processed by the speech 

recognition processing in the prescribed processing time- 

14- The method of claim 13, wherein the step (c) Judges 
that the recognition result satisfies the prescribed 
35 condition when the retrieval key can be determined by a 
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number of confirmation queries less than or equal to a 
prescribed number. 

15- The method of claim 13. wherein the step td) Judges 
5 that the recog-nltlon result does not satisfy the prescribed 
condition when the user negated the prescribed number of 
the confirmation queries. 

16. The method of claim 13, wherein the related 
10 information requested by the related information query of 
the step (d) is a higher level data Indicating a generic 
concept to which a specific concept indicated by the 
retrieval key belongs. 

15 17. The method of claim 16, wherein the step (e) adjusts 
the recognition result by carrying out another confirmation 
process using a speech dialogue with the user according to 
the another recognition result to determine the higher 
level data, extracting the lower level data that are 

20 dependent on determined higher level data as new 
recognition target data, carrying out the speech 
recognition processing for the speech input with respect to 
the new recognition target data to obtain the another 
recognition result. 

25 

18. A method of speech recognition based Interactive 
information retrieval for ascertaining and retrieving a 
target information of a user by determining a retrieval key 
entered by the user using a speech recognition processing, 
30 comprising the steps of: 

(a) storing retrieval key candidates that are classified 
according to attribute values of an attribute item in a 
speech recognition database; 

(b) requesting the user by a speech dialogue with the user 
35 to enter a speech input indicating an attribute value of 
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the attribute Item for the retrieval key, and carrying: out 
the speech reco^itlon processing for the speech Input to 
obtain a recognition result indicating attribute value 
candidates and their recognition likelihoods; 
5 (c) selecting those attribute value candidates which have 
recognition likelihoods that are exceeding a prescribed 
likelihood threshold as attribute value leading candidates, 
and extracting those retrieval key candidates that belong 
to the attribute value leading candidates as new 
10 recognition target data; 

(d) requesting the user by a speech dialogue with the user 
to enter another speech input indicating the retrieval key, 
and carrying out the speech recognition processing for the 
another speech input with respect to the new recognition 

15 target data to obtain another recognition result; and 

(e) carrying out a confirmation process using a speech 
dialogue with the user according to the another recognition 
result to determine the retrieval key. 



!1J 
i.fl 



20 19. The method of claim 18, wherein the attribute Item Is 
selected to be an attribute having attribute value 
candidates that constitute a number of data that can be 
processed by the speech recognition processing in a 
prescribed processing time. 

25 

20. A method of speech recognition based interactive 
information retrieval for ascertaining and retrieving a 
target information of a user by determining a retrieval key 
entered by the user using a speech recognition processing, 

30 comprising the steps of: 

(a) storing retrieval key candidates that constitute a 
number of data that cannot be processed by the speech 
recognition processing In a prescribed processing time as 
recognition target words, In a plurality of statistically 

35 hierarchized databases provided in a speech recognition 
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database, where lower level statistically hlerarchized 
databases contain Increasingly larger part of the retrieval 
key candidates such that a lowest level statistically 
hlerarchized database contains all the retrieval key 
5 candidates; 

(b) requesting the user by a speech dlaloarae with the user 
to enter a speech input indicating the retrieval key, and 
carrying out the speech recognition processing for the 
speech input with respect to all of the plurality of 

10 statistically hlerarchized databases in parallel, to 
sequentially obtain respective recognition results 
indicating recognition retrieval key candidates and their 
recognition likelihoods; 

(c) selecting those recognition retrieval key candidates 
□ 15 which have recognition likelihoods that are exceeding a 

prescribed likelihood Threshold as recognition retrieval 
CO key leading candidates, for each statistically hlerarchized 

database for which the speech recognition processing is 
^ completed; and 

'^3 20 (d) controlling a next speech dialogue with the user 

;L: according to whether a prescribed condition that a number 

m of the recognition retrieval key leading candidates is less 

than or equal to a prescribed number but not zero is 
fi satisfied or not. 

25 

21, The method of claim 20, wherein the step (d) further 
comprises the sub-steps of: 

(dl) carrying out a related information query using a 
speech dialogue with the user to request the user to enter 
30 another speech input indicating a related information of 
the retrieval key, when the prescribed condition is 
satisfied; 

td2) carrying out the speech recognition processing for 
the another speech input to obtain another recognition 
35 result indicating recognition related information 
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candidates and their recognition likelihoods, and adjusting 
the recognition result according to the another recognition 
result to obtain adjusted recognition result; and 
(d3) carrying out a confirmation process using a speech 
5 dialogue with the user according to the adjusted 
recognition result to determine the retrieval key- 

22. The method of claim 21, wherein the step (d2) adjusts 
the recognition result by calculating new recognition 

10 likelihoods for the recognition retrieval key candidates 
according to recognition likelihoods for the recognition 
retrieval key candidates indicated in the recognition 
result and recognition likelihoods for the recognition 
related information candidates indicated in the another 
15 recognition result. 

23, The method of claim 22, wherein the step (d2) 
calculates the new recognition likelihoods for the 
recognition retrieval key candidates by normalizing the 

20 recognition likelihoods for the recognition retrieval key 
candidates Indicated in the recognition result, normalizing 
the recognition likelihoods for the recognition related 
information candidates Indicated in the another recognition 
S result, and multiplying a normalized recognition likelihooa 

:S 25 of each recognition retrieval key candidate with a 

normalized recognition likelihood of a corresponding 
recognition related information candidate tnat is found to 
be related to each recognition retrieval key candidate. 

30 24- The method of claim 21, further comprising the step 
of: 

(e) checking whether any of prescribed next dialogue 
leading conditions is satisfied or not, and shifting a 
recognition target to a next lower level statistically 
35 hlerarchlzed database when any of the prescribed next 
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dialogue leading conditions is satisfied, 

25, The method of claim 24, further comprising the steps 
of: 

5 (f ) adjusting the reco^itlon result for the next lower 
level statistically hlerarchiEed database according to a 
related Information of the retrieval key to obtain another 
adjusted recognition result; 

(g) selecting those recognition retrieval key candidates 
10 which have recognition likelihoods that are exceeding the 

prescribed likelihood threshold as recognition retrieval 
key leading candidates, from the another adjusted 
recognition result; and 

(h) controlling a next speech dialogue with the user 

15 according to whether the prescribed condition that a number 
of recognition retrieval key leading candidates is less 
than or equal to a prescribed number but not zero Is 
satisfied or not, 

20 26- The method of claim 25, wherein the related 

information used at the step (f ) is information already 
obtained before the step (e) in a course of processing a 
higher level statistically hlerarchlzed database, 

25 27, The method of claim 25, wherein the related 

information used at the step (f) is obtained by carrying 
out a related information query using a speech dialogue 
with the user to request the user to enter another speech 
Input for a related information of the retrieval key, when 

30 no related information of the retrieval key is obtained 
yet , 

28. The method of claim 24, wherein the prescribed next 
dialogue leading conditions Include: 
35 (1) a case where the number of the recogiaition retrieval 
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key leading candidates is not lees than or equal to the 
prescribed number; 

(2) a case where the number of the recognition retrieval 
key leading candidates is zero; 
5 (3) a case where a recognition retrieval key candidate 
presented to the user in the confirmation process of the 
step (db) according to the adjusted recognition result is 
negated by the user; and 

(4) a case where no recognition retrieval key leading 
10 candidates is found to be related to the recognition 
related Information candidates obtained by the speech 
recognition processing of the step {d2) , 

29. The method of claim 20, wherein the step (a) stores 
15 the retrieval key candidates in the plurality of 

statistically hierarchized databases, such that an (n^l)-th 
level statistically hierarchized database contains a number 
of the retrieval key candidates that can be processed by 
the speech recognition processing while carrying out a 
20 speech dialogues with the user to determine the retrieval 
key using an n-th level statistically hierarchized 
database , 

30. The method of claim 20, wherein the step (a) stores 
25 the retrieval key candidates in the plurality of 

statistically hierarchized databases according to 
importance levels based on statistical information defined 
for the recognition target words, such that the recognition 
target words in a higher level statistically hierarchized 
30 database have relatively higher importance level than the 
recognition target words in a lower level statistically 
hierarchized database. 

31. A speech recognition based interactive information 
35 retrieval apparatus for ascertaining and retrieving a 
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target Information of a user by determining a retrieval key 
entered by the user using a speech recognition processing, 
comprising; 

a speech recognition database configured to store 
5 retrieval key candidates that constitute a number of data 
that cannot be processed by the speech recognition 
processing In a prescribed processing time, as recognition 
target words, the recognition target words being divided 
into prioritized recognition target words that constitute a 
]0 number of data that can be processed by the speech 

recognition processing in the prescribed processing time 
and that have relatively higher importance levels based on 
statistical information defined for the recognition target 
words, and non-prioritized recognition target words other 
15 than the prioritized recognition target words; 

a speech recognition unit configured to carry out the 
speech recognition processing; and 

a dialogue control unit configured to carry out speech 
dialogues with the user; 
20 wherein the dialogue control unit carries out a speech 

dialogue for requesting the user to enter a speech input 
indicating the retrieval key, such that the speech 
recognition unit carries out the speech recognition 
processing for the speech input with respect to the 
25 prioritized recognition target words to obtain a 
recognition result; 

the dialogue control unit carries out a speech 
dialogue for a confirmation process according to the 
recognition result to determine the retrieval key, when the 
30 recognition result satisfies a prescribed condition for 

Judging that the retrieval key can be determined only by a 
confirmation process with the user; 

the dialogue control unit carries out a speech 
dialogue for a related information query to request the 
35 user to enter another speech input indicating a related 
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Information of the retrieval key. when the recognition 
result does not satisfy the prescribed condition, such that 
the speech recognition unit carries out the speech 
recognition processing for the another speech input to 
5 obtain another recognition result and the dialogue control 
unit adjusts the recognition result according to the 
another recognition result to obtain adjusted recognition 
result, and 

the dialogue control unit controls the speech 
10 dialogues to repeat the confirmation process or the related 
Information query using the adjusted recognition result in 
place of the recognition result, until the retrieval key Is 
determined . 

32. The apparatus of claim 31, wherein the speech 
recognition unit also carries out the speech recognition 
processing for the speech inpat with respect to as many of 
the non-prioritized recognition target words as a number of 
data that can be processed by the speech recognition 
processing in the prescribed processing time to obtain 
additional recognition result, while the dialogue control 
unit is carrying out the related information query using 
the speech dialogue with the user, and 

the dialogue control unit also adjusts the recognition 
result by adding the additional recognition result, 

33, The apparatus of claim 32, wherein the speech 
recognition database stores the non-prioritized recognition 
target words that are subdivided into a plurality of sets 

30 each containing a number of recognition target words that 
can be processed by the speech recognition processing in 
the prescribed processing time, and 

the speech recognition unit carries out the speech 
recognition processing for the speech input with respect to 

35 the plurality of sets in an order of the importance levels 
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of the recognition target words contained In each set. 

34- Tne apparatus of claim 31 p wherein the speech 
recognition unit obtains the recognition result that 
5 indicates recognition retrieval key candidates and their 
recognition likelihoods and the another recognition result 
that indicates recognition related information candidates 
and their recognition likelihoods, and 

the dialogue control unit adjusts the recognition 

10 result by calculating new recognition likelihoods for the 
recognition retrieval key candidates according to 
recognition likelihoods for the recognition retrieval key 
candidates Indicated in the recognition result and 
recognition likelihoods for the recognition related 

15 information candidates indicated in the another recognition 
result • 

35, The apparatus of claim 34. wherein the dialogue 
control unit calculates the new recognition likelihoods for 
20 the recognition retrieval key candidates by multiplying a 
recognition likelihood of each recognition retrieval key 
candidate with a recognition likelihood of a corresponding 
recognition related Information candidate. 

25 36. The apparatus of claim 31. wherein the speech 
recognition unit obtains the recognition result that 
indicates recognition retrieval key candidates and their 
recognition likelihoods, and 

the dialogue control unit Judges that the recognition 

30 result satisfies the prescribed condition, when a number of 
recognition retrieval key leading candidates which have 
recognition likelihoods that are exceeding a prescribed 
likelihood threshold is less than or equal to a prescribed 
number but not zero. 

35 
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37. The apparatus of claim 31, wherein The sratisrlcal 
Information used in the speech recognition database is 
access frequencies of the retrieval key candidates, 

5 38. The apparatus of claim 31, wherein the prescribed 

processing time used In the speech recognition database is 
a real dialogue processing time specified in advance. 

39. The apparatus of claim 31, wherein the retrieval key 
indicates an attribute value of one attribute of the target 
information, and the related information requested by the 
related information query carried out by the dialogue 
control unit is an attribute value of another attribute of 
the target information other than the one attribute* 

40. The apparatus of claim 39. wherein attributes of the 
target information are hierarchically ordered, and the 
another attribute 1b a hierarchically adjacent one of the 
one attribute. 

41. The apparatus of claim 39, wherein the another 
attribute is selected to be an attribute having attribute 
value candidates that constitute a number of data that can 
be processed by the speech recognition processing in the 
prescribed processing: time. 

42. The apparatus of claim 31, wherein the speech 
recognition database stores the retrieval key candidates 
indicating attribute values of a plurality of attributes of 

30 the target information, such that the retrieval key entered 
by the user can indicate an attribute value of any one of 
the plurality of attributes, 

43. The apparatus of claim 31, wherein the speech 

35 recognition database stores the retrieval key candidates as 
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lower level data, and also stores higher level data that 
constitute a uamber of data that can be processed by the 
speech recogmltion processing in the prescribed processing 
time, where each lower level data is dependent on one 
5 higher level data and lower level data that are dependent 
on one higher level data constitute a number of data that 
can be processed by the speech recog^nitlon processing In 
the prescribed processing time. 

44- The apparatus of claim 43, wherein the dialogue 
control unit Judges that the recognition result satisfies 
the prescribed condition when the retrieval key can be 
determined by a number of confirmation queries less than or 
equal to a prescribed number- 

45. The apparatus of claim 43, wherein the dialogue 
control unit Judges that the recognition result does not 
satisfy the prescribed condlTlon when the user negated the 
prescribed number of the confirmation queries. 

46- The apparatus of claim 43, wherein the related 
Information requested by the related Information query 
carried out by the dialogue control unit Is a higher level 
data indicating a generic concept to which a specific 
concept indicated by the retrieval key belongs. 

47- The apparatus of claim 46 ^ wherein the dialogue 
control unit adjusts the recognition result by carrying out 
another confirmation process using a speech dialogue with 

30 the user according to the another recognition result to 

determine the higher level data, extracting the lower level 
data that are dependent on determined higher level data as 
new recognition target data, carrying out the speech 
recognition processing for the speech input with respect to 

35 the new recognition target data to obtain the another 
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recognition resulr. 

48, A speech recognition based interactive information 
rerrieval apparatus for ascertaining and retrieving a 
5 target Information of a user by determining a retrieval key 
entered by the user using a speech recognition processing, 
comprising: 

a speech recognition database configured to store 
retrieval key candidates that are classified according to 
10 attribute values of an attribute item; 

a speech recognition unit configured to carry out the 
speech recognition processing; and 

a dialogue control unit configured to carry out speech 
dialogues with the user; 
15 wherein the dialogue control unit carries out a speech 

a dialogue for requesting the user to enter a speech input 

:0 Indicating an attribute value of the attribute item for the 

retrieval key, such that the speech recognition unit 
carries out the speech recognition processing for the 
20 speech input to obtain a recognition result indicating 
s attribute value candidates and their recognition 

likelihoods; 

the dialogue control unit selects those attribute 
3 value candidates which have recognition likelihoods that 

25 are exceeding a prescribed likelihood threshold as 

attribute value leading candidates, and extracts those 
retrieval key candidates that belong to the attribute value 
leading candidates as new recognition target data; 

the dialogue control unit carries out a speech 
30 dialogue for requesting the user to enter another speech 
input indicating the retrieval key, such that the speech 
recognition unit carries out the speech recognition 
processing for the another speech input with respect to the 
new recognition target data to obtain another recognition 
35 result; and 
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the dialogue control unit carries out a speech 
dlalo^e for a confirmation process according to the 
another recognition result to determine the retrieval key, 

5 49* The apparatus of claim 48, wherein the attribute item 
is selected to be an attribute having attribute value 
candidates that constitutes a number of data that can be 
processed by the speech recognition processing in a 
prescribed processing time, 

10 

50. A speech recognition based interactive Information 
retrieval apparatus for ascertaining and retrieving a 
target Information of a user by determining a retrieval key 
entered by the user using a speech recognition processing, 

15 comprising: 

a speech recognition database having a plurality of 
statistically hierarchlzed databases configured ro store 
retrieval key candidates that constitute a number of data 
that cannot be processed by the speech recognition 

20 processing in a prescribed processing time as recognition 
target words, where lower level statistically hlerarchized 
databases contain increasingly larger part of the retrieval 
key candidates such that a lowest level statistically 
hierarchized database contains all the retrieval key 

25 candidates ; 

a speech recognition unit configured to carry out the 
speech recognition processing; and 

a dialogue control unit configured to carry out speech 
dialogues with the user; 

30 wherein the dialogue control unit carries out a speech 

dialogue for requesting the user to enter a speech input 
indicating the retrieval key, such that the speech 
recognition unit carries out the speech recognition 
processing for the speech Input with respect to all of the 

35 plurality of statistically hierarchized databases In 
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parallel, xo sequentially obtain respective recognition 
results indicating: recognition retrieval key candidates and 
their recognition likelihoods; 

the dialogue control unit selects those recognition 
5 retrieval key candidates which have recognition likelihoods 
that are exceeding a prescribed likelihood threshold as 
recognition retrieval key leading candidates, for each 
statistically hierarchized database for which the speech 
recognition processing is completed; and 
10 the dialogue control unit controls a next speech 

dialogue with the user according to whether a prescribed 
condition that a number of the recognition retrieval Key 
leading candidates is less than or equal to a prescribed 
number but not zero is satisfied or not, 

15 

51, The apparatus of claim 50, wherein the dialogue 
control unit controls the next speech dialogue by; 

carrying out a speech dialogue for a related 
Information query to request the user to enter another 

20 speech input indlcaring a related information of the 

retrieval key, when the prescribed condition is satisfied, 
such that the speech recognition unit carries out the 
speech recognition processing for the another speech input 
to obtain another recognition result indicating recognition 

25 related Information candidates and their recognition 
likelihoods , 

adjusting the recognition result according to the 
another recognition result to obtain adjusted recognition 
result; and 

30 carrying out a speech dialogue for a confirmation 

process according to the adjusted recognition result to 
determine the retrieval key, 

52, The apparatus of claim 51, wherein the dialogue 

35 control unit adjusts the recognition result by calculating 
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new recognition likelihoods for the recognition retrieval 
key candidates according: to recognition likelihoods for the 
recognition retrieval key candidates indicated in the 
recognition result and recognition likelihoods for the 
5 recognition related information candidates indicated in the 
another recognition result. 

53, The apparatus of claim 52, wherein the dialogue 
control unit calculates the new recognition likelihoods for 

10 the recognition retrieval key candidates by normalizing the 
recognition likelihoods for the recognition retrieval key 
candidates indicated in the recognition result, normalizing 
the recognition likelihoods for the recognition related 
information candidates indicated in the another recognition 

15 result, and multiplying a normalized recognition likelihood 
of each recognition retrieval key candidate with a 
normalized recognition likelihood of a corresponding 
recognition related information candidate that is found to 
be related to each recognition retrieval key candidate. 

20 

54, The apparatus of claim 51, wherein the dialogue 
control unit also checks whether any of prescribed next 
dialogue leading conditions is satisfied or not. and shifts 
a recognition target to a next lower level statistically 

25 hierarchlzed database when any of the prescribed next 
dialogue leading conditions is satisfied- 

55, The apparatus of claim 54. wherein the dialogue 
control unit adjusts the recognition result for the next 

30 lower level statistically hlerarchlzed database according 
to a related information of the retrieval key to obtain 
another adjusted recognition result, selects those 
recognition retrieval key candidates which have recognition 
likelihoods that are exceeding the prescribed likelihood 

35 threshold as recognition retrieval key leading candidates, 
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from the another adjusted recognition result, and controls 
a next speech dialogue with the user according to whether 
the prescribed condition that a number of recognition 
retrieval key leading candidates is less than or equal ro a 
5 prescribed number but not zero is satisfied or not, 

56, The apparatus of claim 55. wherein the related 
information used in adjusting the recognition result for 
the next lower level statistically hierarchlzed database is 

10 Information already obtained before shifting the 

recognition target to the next lower level statistically 
hierarchized database in a course of processing a higher 
level statistically hierarchized database. 

57. The apparatus of claim 55, wherein the related 
Information used in adjusting the recognition result for 
the next lower level statistically hierarchized database is 
obtained by carrying out a speech dialogue for a related 
information query to request the user to enter another 
speech input for a related information of the retrieval 
key, when no related information of the retrieval key is 
obtained yet. 

58- The apparatus of claim 54, wherein the prescribed next 
25 dialogue leading conditions include: 

(1) a case where the number of the recognition retrieval 
key leading candidates is not less than or equal to the 
prescribed number; 

(2) a case where the number of the recognition retrieval 
30 key leading candidates is zero; 

(3) a case where a recognition retrieval key candidate 
presented to the user in the confirmation process according 
to the adjusted recognition result is negated by the user; 
and 

35 (4) a case where no recognition retrieval key leading 
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candidates Is found to be related to the recognition 
related Information candidates obtained by the speech 
reco^ltlon processlng- 

59. The apparatus of claim 50. wherein the speech 
recognition database stores the retrieval key candidates In 
the plurality of statistically hlerarchlzed databases, such 
that an (n+l)-th level statistically hlerarchlzed database 
contains a number of the retrieval key candidates that can 
be processed by the speech recognition processing while 
carrying out a speech dialogues with the user to determine 
the retrieval key using an n-^th level statistically 
hlerarchlzed database. 

60, The apparatus of claim 50, wherein the speech 
recognition database stores the retrieval key candidates in 
the plurality of statistically hlerarchlzed databases 
according to Importance levels based on statistical 
information defined for the recognition target words » such 
thar the recognition target words in a higher level 
statistically hlerarchlzed database have relatively higher 
Importance level than the recognition target words in a 
lower level statistically hlerarchlzed database, 

3 

3 25 61- A computer usable medium having computer readable 

program codes embodied therein for causing a computer to 
function as a speech recognition based interactive 
information retrieval system for ascertaining and 
retrieving a target information of a user by determining a 

30 retrieval key entered by the user using a speech 

recognition processing and a speech recognition database 
for storing retrieval key candidates that constitute a 
number of data that cannot be processed by the speech 
recognition processing in a prescribed processing time, as 

35 recognition target words In a speech recognition database, 
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the recognition target words being divided into prioritized 
recognition target words that constitute a number of data 
that can be processed by the speech recognition processing 
m the prescribed processing time which have relatively 
5 higher importance levels based on statistical Information 
defined for the recognition target words, and non- 
prioritized recognition target words other than the 
prioritized recognition target words, the computer readable 
program codes Include: 
10 a first computer readable program code for causing 

said computer to request the user by a speech dialogue with 
the user to enter a speech input indicating the retrieval 
key, and carry out the speech recognition processing for 
the speech input with respect to the prioritized 
15 recognition target words to obtain a recognition result; 

a second computer readable program code for causing 
said computer to carry out a confirmation process using a 
speech dialogue with the user according to the recognition 
result to determine the retrieval key, when the recognition 
20 result satisfies a prescribed condition for Judging that 
the retrieval key can be determined only by a confirmation 
process with the user; 

a third computer readable program code for causing 
said computer to carry out a related information query 
25 using a speech dialogue with the user to request the user 
to enter another speech input indicating a related 
information of the retrieval key, when the recognition 
result does not satisfy the prescribed condition; 

a fourth computer readable program code for causing 
30 said computer to carry out the speech recognition 

processing for the another speech input to obtain another 
recognition result, and adjust the recognition result 
according to the another recognition result to obtain 
adjusted recognition result; and 
35 a fifth computer readable program code for causing 
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said computer to repeat processing of the second computer 
readable program code or the third and fourth computer 
readable prog-ram codes using the adjusted recognition 
result In place of the recognition result, until the 
5 retrieval key Is determined. 

62. A computer usable medium storing a data structure to 
be used as a speech recognition database In a speech 
recognition based Interactive Information retrieval system 
10 for ascertaining and retrieving a target information of a 
user by determining a retrieval key entered by the user 
using a speech recognition processing, the data structure 
comprising: 

retrieval key candidates that constitute a number of 
15 data that cannot be processed by the speech recognition 

processing In a prescribed processing time, as recognition 
target words, the recognition target words being divided 
iVi into prioritized recognition target words that constitute a 

iU number of data that can be processed by the speech 

ITI 20 recognition processing In the prescribed processing time 

is which have relatively higher importance levels based on 

Q statistical information defined for the recognition target 

words, and non-prioritized recognition target words other 
P than the prioritized recognition target words, 

y 25 

63. The computer usable medium of claim 62, wherein the 
data structure stores the retrieval key candidates as lower 
level data, and also stores higher level data that 
constitute a number of data that can be processed by the 

30 speech recognition processing in the prescribed processing 
time, where each lower level data is dependent on one 
higher level data and lower level data that are dependent 
on one higher level data constitute a number of data that 
can be processed by the speech recognition processing in 

35 the prescribed processing time. 
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64. A computer usable medium having computer readable 
program codes embodied therein for causing a computer to 
function as a speech recognition based Interactive 

5 information retrieval system for ascertaining and 

retrieving a target information of a user by determining a 
retrieval key entered by the user using a speech 
recognition processing and a speech recognition database 
for storing retrieval key candidates that are classified 

10 according to attribute values of an attribute Item, the 
computer readable program codes Include: 

a first computer readable program code for causinf 
said computer to request the user by a speech dialogue with 
the user to enter a speech input indicating an attribute 

15 value of the attribute item for the retrieval key, and 

carry out the speech recognition processing for the speech 
input to obtain a recognition result Indicating attribute 
value candidates and their recognition likelihoods; 

a second computer readable program code for causing 

20 said computer to select those attribute value candidates 
which have recognition likelihoods that are exceeding a 
prescribed likelihood threshold as attribute value leading 
candidates, and extract those retrieval key candidates that 
belong to the attribute value leading candidates as new 

25 recognition target data; 

a third computer readable program code for causing 
said computer to request the user by a speech dialogue with 
the user to enter another speech input indicating the 
retrieval key, and carry out the speech recognition 

30 processing for the another speech input with respect to the 
new recognition target data to obtain another recognition 
result; and 

a fourth computer readable program code for causing 
said computer to carry out a confirmation process using a 
35 speech dialogue with the user according to the another 
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recogfnition result to determine the retrieval key. 

65- A computer usable rqedlum having computer readable 
program codes embodied therein for causing a computer to 
5 function as a speech recognition based interactive 
Information retrieval system for ascertaining and 
retrieving: a target information of a user by determining a 
retrieval key entered by the user using a speech 
recognition processing and a speech recognition database 

10 having a plurality of statistically hierarchized databases 
for storing retrieval key candidates that constitute a 
number of data that cannot be processed by the speech 
recognition processing in a prescribed processing time as 
recognition target words, where lower level statistically 

15 hierarchized databases contain increasingly larger part of 
the retrieval key candidates such that a lowest level 
statistically hierarchized database contains all the 
retrieval key candidates, the computer readable program 
codes Include: 

20 a first computer readable program code for causing 

said computer to request the user by a speech dialogue with 
the user to enter a speech input Indicating the retrieval 
key, and carry out the speech recognition processing for 
the speech input with respect to ail of the plurality of 

25 statistically hierarchized databases in parallel, to 
sequentially obtain respective recognition results 
indicating recognition retrieval key candidates and their 
recognition likelihoods; 

a second computer readable program code for causing 

30 said computer to select those recognition retrieval key 
candidates which have recognition likelihoods that are 
exceeding a prescribed likelihood threshold as recognition 
retrieval key leading candidates, for each statistically 
hierarchized database for which the speech recognition 

35 processing Is completed; and 
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a third computer readable program code for causing 
said computer to control a next speech dialogue with the 
user according to whether a prescribed condition that a 
number of the recognition retrieval key leading candidates 
5 Is less than or equal to a prescribed number but not zero 
Is satisfied or not. 
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ABSTRACT OF THE DISCLOSURE 

In the disclosed speech recognition based Interactive 
information retrieval scheme, the recognition target words 
5 in the speech recognition database are divided Into 
prioritised recognition target words that constitute a 
number of data that can be processed by the speech 
recognition processing in the prescribed processing time 
and that have relatively higher importance levels based on 
10 statistical Information, and the other non-prioritized 
recognition target words. Then, the speech recognition 
processing for the speech Input with respect to the 
prioritized recognition target words is carried out at 
higher priority, and a confirmation process Is carried out 
15 when the recognition result satisfies a prescribed 
condition for judging that the retrieval key can be 
determined only by a confirmation process with the user. On 
the other hand, a related information query to request the 
user to enter another speech Input for a related 
20 Information of the retrieval key Is carried out when the 
recognition result does not satisfy the prescribed 
!J condition, and the original recognition result is adjusted 

is? I 

jj according to the recog^nltion result for another speech 

3 Input, In this way, the retrieval key determination Is 

j;^ 25 realized through natural speech dialogues with the user. 
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Attrmey Dsd^et N:> . 13700-0240 

ntle : Speech Recognitio n Based TrLhRrani-i vf^ Information Retrieval Scheme. 
Bags 2 



RiQ name of solo ac flrsr IrK^itor: Kumiko OHMQRI 



Citizenship: japan 



ppciirtorpa* ns, Nakajima, Chigasaki-shiy Kanagawa-ken^ Japan 



ftet QEf jjoe M3res5: C/Q NIPPON TELEGRAPH AND TELEPHONE CORPORATION 



20-2,Nlshx-5hinjuku 3-choiT)e, Shin juku-ku, Tokyo 163-1419 Japan 



siqnati^ j^^pW^P 5%^fni 



Rjll rare c£ sejsrd -joinc iro^^ Masanobu HIGASHIDA 



CLtiaEn^up; Japan 

Pesidarre; 1-2-15-105, Kabutodai; Kizucho, Soraku-gun^ 



hi Kyoto-fu^ Japan 



^etet Q££lae Adac^: C/Q NIPPON TELEGRAPH AND TELEPHONE CORPORATION 



20-2, Nishi^-Shinjuku 3-chome, Shinjuku-ku, Tokyo 



163-1419 Japan 

^JmecS2:£'s sL<^^ ^ .^v ^te: May 24, 2000 




rare of third joint In^gitrjT/ If Noriko MIZUSAWA 



■tiagishlps Japan 



:]&^ljdenro: i„6-t-507, shioiri-cho, Yokosuka-shi. Kanaaawa-ken, Japan 



.itst QEElos Ad3cea: C/Q NIPPON TELEGRAPH AND TELEPHONE CORPORATION 



2Q-2,Nl5hi-shin:iuku 3-chome, Shin juku-ku, Tokyo 163-1419 Japan 



:lnv^3txfl:'s signatuce -7^ ^^- /^^ ^^...^^.^ C5te= May 24, 2000 
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